View
25
Download
2
Category
Tags:
Preview:
DESCRIPTION
Managing Uncertainty in Spatial and Spatio -temporal Data. Andreas Züfle 1 , Goce Trajcevski², Tobias Emrich ? Matthias Renz 1 , Hans-Peter Kriegel 1 , Nikos Mamoulis³, Reynold Cheng³ 1 LMU Munich ² NWU Evanston ³ HKU Hong Kong 4 USC Los Angeles. - PowerPoint PPT Presentation
Citation preview
Managing Uncertainty in Spatial and Spatio-temporal Data
Andreas Zuumlfle1 Goce Trajcevskisup2 Tobias Emrich
Matthias Renz1 Hans-Peter Kriegel1 Nikos Mamoulissup3 Reynold Chengsup3
1 LMU Munichsup2 NWU Evanstonsup3 HKU Hong Kong4 USC Los Angeles
Managing Uncertainty in Spatial and Spatio-temporal Data
Andreas Zuumlfle1 Goce Trajcevskisup2 Tobias Emrich
Matthias Renz1 Hans-Peter Kriegel1 Nikos Mamoulissup3 Reynold Chengsup3
1 LMU Munichsup2 NWU Evanstonsup3 HKU Hong Kong4 USC Los Angeles
3
4
Aim of this tutorial hellip
rsaquo Understand basic concepts for scalable probabilistic query processing on uncertain spatial and spatio-temporal query processing
rsaquo A tutorial not a survey
rsaquo Get the big picture hellipndash NOT in terms of a long list of recent methods and
algorithmsndash BUT in terms of general concepts commonly used in this field
5
Outline
rsaquo Tutorial decomposed into three partsndash Uncertain Spatial Data (Andreas Zuumlfle)ndash Uncertain Spatio-Temporal Data (Geometric Approach) (Goce
Trajcevski)ndash Uncertain Spatio-Temporal Data (Probabilistic Approach) (Tobias
Emrich)
rsaquo Please feel free to ask questions at any time during the presentation
rsaquo The latest version of these slides will be made available within the next week
httpwwwdbsifilmude~zuefle
6
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
7
Geo-Spatial Data
bull Huge flood of geo-spatial databull Modern technologybull New user mentality
bull Great research potentialbull New applicationsbull Innovative researchbull Economic Boost
bull ldquo$600 billion potential annual consumer surplus from using personal location datardquo [1]
[1] McKinsey Global Institute Big data The next frontier forinnovation competition and productivity June 2011
8
Geo-Spatial Data
9
10
Spatio-Temporal Data
bull (object location time) triples
bull Queries bull ldquoFind friends that
attended the same concert last saturdayrdquo
bull Best case Continuous function
GPS log taken from a thirty minute drive through SeattleDataset provided by P Newson and J Krumm Hidden Markov Map Matching Through Noise and Sparseness ACMGIS 2009
11
Sources of Uncertainty
bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
12
Sources of Uncertainty
bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi
positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)
From database perspective the position of a mobile object is uncertain
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
13
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
14
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
15
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
Improve the quality of modern location based applications and of research results in the field
16
Uncertain Spatial Data Models
bull Discrete Models
bull Continuous Models
Possible World Semantics
04
b
17
Possible World Semantics
bull A collection of uncertain spatial objects defines an uncertain spatial database
bull Combinations of object instances define possible database instances calledPossible Worlds
bull Assumption The probability of a possible world can be computed efficiently
Possible World Semantics
18
Answering Queries using PWS
Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in
world and zero otherwise
The probability that a query predicate holds on an uncertain database is defined as
Possible World Semantics
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
20
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
21
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
Managing Uncertainty in Spatial and Spatio-temporal Data
Andreas Zuumlfle1 Goce Trajcevskisup2 Tobias Emrich
Matthias Renz1 Hans-Peter Kriegel1 Nikos Mamoulissup3 Reynold Chengsup3
1 LMU Munichsup2 NWU Evanstonsup3 HKU Hong Kong4 USC Los Angeles
3
4
Aim of this tutorial hellip
rsaquo Understand basic concepts for scalable probabilistic query processing on uncertain spatial and spatio-temporal query processing
rsaquo A tutorial not a survey
rsaquo Get the big picture hellipndash NOT in terms of a long list of recent methods and
algorithmsndash BUT in terms of general concepts commonly used in this field
5
Outline
rsaquo Tutorial decomposed into three partsndash Uncertain Spatial Data (Andreas Zuumlfle)ndash Uncertain Spatio-Temporal Data (Geometric Approach) (Goce
Trajcevski)ndash Uncertain Spatio-Temporal Data (Probabilistic Approach) (Tobias
Emrich)
rsaquo Please feel free to ask questions at any time during the presentation
rsaquo The latest version of these slides will be made available within the next week
httpwwwdbsifilmude~zuefle
6
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
7
Geo-Spatial Data
bull Huge flood of geo-spatial databull Modern technologybull New user mentality
bull Great research potentialbull New applicationsbull Innovative researchbull Economic Boost
bull ldquo$600 billion potential annual consumer surplus from using personal location datardquo [1]
[1] McKinsey Global Institute Big data The next frontier forinnovation competition and productivity June 2011
8
Geo-Spatial Data
9
10
Spatio-Temporal Data
bull (object location time) triples
bull Queries bull ldquoFind friends that
attended the same concert last saturdayrdquo
bull Best case Continuous function
GPS log taken from a thirty minute drive through SeattleDataset provided by P Newson and J Krumm Hidden Markov Map Matching Through Noise and Sparseness ACMGIS 2009
11
Sources of Uncertainty
bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
12
Sources of Uncertainty
bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi
positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)
From database perspective the position of a mobile object is uncertain
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
13
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
14
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
15
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
Improve the quality of modern location based applications and of research results in the field
16
Uncertain Spatial Data Models
bull Discrete Models
bull Continuous Models
Possible World Semantics
04
b
17
Possible World Semantics
bull A collection of uncertain spatial objects defines an uncertain spatial database
bull Combinations of object instances define possible database instances calledPossible Worlds
bull Assumption The probability of a possible world can be computed efficiently
Possible World Semantics
18
Answering Queries using PWS
Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in
world and zero otherwise
The probability that a query predicate holds on an uncertain database is defined as
Possible World Semantics
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
20
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
21
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
3
4
Aim of this tutorial hellip
rsaquo Understand basic concepts for scalable probabilistic query processing on uncertain spatial and spatio-temporal query processing
rsaquo A tutorial not a survey
rsaquo Get the big picture hellipndash NOT in terms of a long list of recent methods and
algorithmsndash BUT in terms of general concepts commonly used in this field
5
Outline
rsaquo Tutorial decomposed into three partsndash Uncertain Spatial Data (Andreas Zuumlfle)ndash Uncertain Spatio-Temporal Data (Geometric Approach) (Goce
Trajcevski)ndash Uncertain Spatio-Temporal Data (Probabilistic Approach) (Tobias
Emrich)
rsaquo Please feel free to ask questions at any time during the presentation
rsaquo The latest version of these slides will be made available within the next week
httpwwwdbsifilmude~zuefle
6
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
7
Geo-Spatial Data
bull Huge flood of geo-spatial databull Modern technologybull New user mentality
bull Great research potentialbull New applicationsbull Innovative researchbull Economic Boost
bull ldquo$600 billion potential annual consumer surplus from using personal location datardquo [1]
[1] McKinsey Global Institute Big data The next frontier forinnovation competition and productivity June 2011
8
Geo-Spatial Data
9
10
Spatio-Temporal Data
bull (object location time) triples
bull Queries bull ldquoFind friends that
attended the same concert last saturdayrdquo
bull Best case Continuous function
GPS log taken from a thirty minute drive through SeattleDataset provided by P Newson and J Krumm Hidden Markov Map Matching Through Noise and Sparseness ACMGIS 2009
11
Sources of Uncertainty
bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
12
Sources of Uncertainty
bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi
positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)
From database perspective the position of a mobile object is uncertain
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
13
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
14
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
15
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
Improve the quality of modern location based applications and of research results in the field
16
Uncertain Spatial Data Models
bull Discrete Models
bull Continuous Models
Possible World Semantics
04
b
17
Possible World Semantics
bull A collection of uncertain spatial objects defines an uncertain spatial database
bull Combinations of object instances define possible database instances calledPossible Worlds
bull Assumption The probability of a possible world can be computed efficiently
Possible World Semantics
18
Answering Queries using PWS
Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in
world and zero otherwise
The probability that a query predicate holds on an uncertain database is defined as
Possible World Semantics
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
20
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
21
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
4
Aim of this tutorial hellip
rsaquo Understand basic concepts for scalable probabilistic query processing on uncertain spatial and spatio-temporal query processing
rsaquo A tutorial not a survey
rsaquo Get the big picture hellipndash NOT in terms of a long list of recent methods and
algorithmsndash BUT in terms of general concepts commonly used in this field
5
Outline
rsaquo Tutorial decomposed into three partsndash Uncertain Spatial Data (Andreas Zuumlfle)ndash Uncertain Spatio-Temporal Data (Geometric Approach) (Goce
Trajcevski)ndash Uncertain Spatio-Temporal Data (Probabilistic Approach) (Tobias
Emrich)
rsaquo Please feel free to ask questions at any time during the presentation
rsaquo The latest version of these slides will be made available within the next week
httpwwwdbsifilmude~zuefle
6
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
7
Geo-Spatial Data
bull Huge flood of geo-spatial databull Modern technologybull New user mentality
bull Great research potentialbull New applicationsbull Innovative researchbull Economic Boost
bull ldquo$600 billion potential annual consumer surplus from using personal location datardquo [1]
[1] McKinsey Global Institute Big data The next frontier forinnovation competition and productivity June 2011
8
Geo-Spatial Data
9
10
Spatio-Temporal Data
bull (object location time) triples
bull Queries bull ldquoFind friends that
attended the same concert last saturdayrdquo
bull Best case Continuous function
GPS log taken from a thirty minute drive through SeattleDataset provided by P Newson and J Krumm Hidden Markov Map Matching Through Noise and Sparseness ACMGIS 2009
11
Sources of Uncertainty
bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
12
Sources of Uncertainty
bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi
positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)
From database perspective the position of a mobile object is uncertain
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
13
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
14
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
15
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
Improve the quality of modern location based applications and of research results in the field
16
Uncertain Spatial Data Models
bull Discrete Models
bull Continuous Models
Possible World Semantics
04
b
17
Possible World Semantics
bull A collection of uncertain spatial objects defines an uncertain spatial database
bull Combinations of object instances define possible database instances calledPossible Worlds
bull Assumption The probability of a possible world can be computed efficiently
Possible World Semantics
18
Answering Queries using PWS
Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in
world and zero otherwise
The probability that a query predicate holds on an uncertain database is defined as
Possible World Semantics
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
20
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
21
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
5
Outline
rsaquo Tutorial decomposed into three partsndash Uncertain Spatial Data (Andreas Zuumlfle)ndash Uncertain Spatio-Temporal Data (Geometric Approach) (Goce
Trajcevski)ndash Uncertain Spatio-Temporal Data (Probabilistic Approach) (Tobias
Emrich)
rsaquo Please feel free to ask questions at any time during the presentation
rsaquo The latest version of these slides will be made available within the next week
httpwwwdbsifilmude~zuefle
6
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
7
Geo-Spatial Data
bull Huge flood of geo-spatial databull Modern technologybull New user mentality
bull Great research potentialbull New applicationsbull Innovative researchbull Economic Boost
bull ldquo$600 billion potential annual consumer surplus from using personal location datardquo [1]
[1] McKinsey Global Institute Big data The next frontier forinnovation competition and productivity June 2011
8
Geo-Spatial Data
9
10
Spatio-Temporal Data
bull (object location time) triples
bull Queries bull ldquoFind friends that
attended the same concert last saturdayrdquo
bull Best case Continuous function
GPS log taken from a thirty minute drive through SeattleDataset provided by P Newson and J Krumm Hidden Markov Map Matching Through Noise and Sparseness ACMGIS 2009
11
Sources of Uncertainty
bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
12
Sources of Uncertainty
bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi
positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)
From database perspective the position of a mobile object is uncertain
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
13
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
14
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
15
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
Improve the quality of modern location based applications and of research results in the field
16
Uncertain Spatial Data Models
bull Discrete Models
bull Continuous Models
Possible World Semantics
04
b
17
Possible World Semantics
bull A collection of uncertain spatial objects defines an uncertain spatial database
bull Combinations of object instances define possible database instances calledPossible Worlds
bull Assumption The probability of a possible world can be computed efficiently
Possible World Semantics
18
Answering Queries using PWS
Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in
world and zero otherwise
The probability that a query predicate holds on an uncertain database is defined as
Possible World Semantics
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
20
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
21
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
6
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
7
Geo-Spatial Data
bull Huge flood of geo-spatial databull Modern technologybull New user mentality
bull Great research potentialbull New applicationsbull Innovative researchbull Economic Boost
bull ldquo$600 billion potential annual consumer surplus from using personal location datardquo [1]
[1] McKinsey Global Institute Big data The next frontier forinnovation competition and productivity June 2011
8
Geo-Spatial Data
9
10
Spatio-Temporal Data
bull (object location time) triples
bull Queries bull ldquoFind friends that
attended the same concert last saturdayrdquo
bull Best case Continuous function
GPS log taken from a thirty minute drive through SeattleDataset provided by P Newson and J Krumm Hidden Markov Map Matching Through Noise and Sparseness ACMGIS 2009
11
Sources of Uncertainty
bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
12
Sources of Uncertainty
bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi
positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)
From database perspective the position of a mobile object is uncertain
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
13
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
14
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
15
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
Improve the quality of modern location based applications and of research results in the field
16
Uncertain Spatial Data Models
bull Discrete Models
bull Continuous Models
Possible World Semantics
04
b
17
Possible World Semantics
bull A collection of uncertain spatial objects defines an uncertain spatial database
bull Combinations of object instances define possible database instances calledPossible Worlds
bull Assumption The probability of a possible world can be computed efficiently
Possible World Semantics
18
Answering Queries using PWS
Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in
world and zero otherwise
The probability that a query predicate holds on an uncertain database is defined as
Possible World Semantics
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
20
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
21
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
7
Geo-Spatial Data
bull Huge flood of geo-spatial databull Modern technologybull New user mentality
bull Great research potentialbull New applicationsbull Innovative researchbull Economic Boost
bull ldquo$600 billion potential annual consumer surplus from using personal location datardquo [1]
[1] McKinsey Global Institute Big data The next frontier forinnovation competition and productivity June 2011
8
Geo-Spatial Data
9
10
Spatio-Temporal Data
bull (object location time) triples
bull Queries bull ldquoFind friends that
attended the same concert last saturdayrdquo
bull Best case Continuous function
GPS log taken from a thirty minute drive through SeattleDataset provided by P Newson and J Krumm Hidden Markov Map Matching Through Noise and Sparseness ACMGIS 2009
11
Sources of Uncertainty
bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
12
Sources of Uncertainty
bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi
positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)
From database perspective the position of a mobile object is uncertain
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
13
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
14
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
15
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
Improve the quality of modern location based applications and of research results in the field
16
Uncertain Spatial Data Models
bull Discrete Models
bull Continuous Models
Possible World Semantics
04
b
17
Possible World Semantics
bull A collection of uncertain spatial objects defines an uncertain spatial database
bull Combinations of object instances define possible database instances calledPossible Worlds
bull Assumption The probability of a possible world can be computed efficiently
Possible World Semantics
18
Answering Queries using PWS
Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in
world and zero otherwise
The probability that a query predicate holds on an uncertain database is defined as
Possible World Semantics
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
20
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
21
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
8
Geo-Spatial Data
9
10
Spatio-Temporal Data
bull (object location time) triples
bull Queries bull ldquoFind friends that
attended the same concert last saturdayrdquo
bull Best case Continuous function
GPS log taken from a thirty minute drive through SeattleDataset provided by P Newson and J Krumm Hidden Markov Map Matching Through Noise and Sparseness ACMGIS 2009
11
Sources of Uncertainty
bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
12
Sources of Uncertainty
bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi
positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)
From database perspective the position of a mobile object is uncertain
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
13
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
14
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
15
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
Improve the quality of modern location based applications and of research results in the field
16
Uncertain Spatial Data Models
bull Discrete Models
bull Continuous Models
Possible World Semantics
04
b
17
Possible World Semantics
bull A collection of uncertain spatial objects defines an uncertain spatial database
bull Combinations of object instances define possible database instances calledPossible Worlds
bull Assumption The probability of a possible world can be computed efficiently
Possible World Semantics
18
Answering Queries using PWS
Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in
world and zero otherwise
The probability that a query predicate holds on an uncertain database is defined as
Possible World Semantics
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
20
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
21
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
9
10
Spatio-Temporal Data
bull (object location time) triples
bull Queries bull ldquoFind friends that
attended the same concert last saturdayrdquo
bull Best case Continuous function
GPS log taken from a thirty minute drive through SeattleDataset provided by P Newson and J Krumm Hidden Markov Map Matching Through Noise and Sparseness ACMGIS 2009
11
Sources of Uncertainty
bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
12
Sources of Uncertainty
bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi
positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)
From database perspective the position of a mobile object is uncertain
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
13
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
14
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
15
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
Improve the quality of modern location based applications and of research results in the field
16
Uncertain Spatial Data Models
bull Discrete Models
bull Continuous Models
Possible World Semantics
04
b
17
Possible World Semantics
bull A collection of uncertain spatial objects defines an uncertain spatial database
bull Combinations of object instances define possible database instances calledPossible Worlds
bull Assumption The probability of a possible world can be computed efficiently
Possible World Semantics
18
Answering Queries using PWS
Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in
world and zero otherwise
The probability that a query predicate holds on an uncertain database is defined as
Possible World Semantics
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
20
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
21
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
10
Spatio-Temporal Data
bull (object location time) triples
bull Queries bull ldquoFind friends that
attended the same concert last saturdayrdquo
bull Best case Continuous function
GPS log taken from a thirty minute drive through SeattleDataset provided by P Newson and J Krumm Hidden Markov Map Matching Through Noise and Sparseness ACMGIS 2009
11
Sources of Uncertainty
bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
12
Sources of Uncertainty
bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi
positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)
From database perspective the position of a mobile object is uncertain
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
13
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
14
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
15
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
Improve the quality of modern location based applications and of research results in the field
16
Uncertain Spatial Data Models
bull Discrete Models
bull Continuous Models
Possible World Semantics
04
b
17
Possible World Semantics
bull A collection of uncertain spatial objects defines an uncertain spatial database
bull Combinations of object instances define possible database instances calledPossible Worlds
bull Assumption The probability of a possible world can be computed efficiently
Possible World Semantics
18
Answering Queries using PWS
Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in
world and zero otherwise
The probability that a query predicate holds on an uncertain database is defined as
Possible World Semantics
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
20
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
21
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
11
Sources of Uncertainty
bull Missing Observationsbull Missing GPS signalbull RFID sensors available in discrete locations onlybull Wireless sensor nodes sending infrequently to preserve energybull Infrequent check-ins of users of geo-social networks
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
12
Sources of Uncertainty
bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi
positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)
From database perspective the position of a mobile object is uncertain
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
13
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
14
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
15
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
Improve the quality of modern location based applications and of research results in the field
16
Uncertain Spatial Data Models
bull Discrete Models
bull Continuous Models
Possible World Semantics
04
b
17
Possible World Semantics
bull A collection of uncertain spatial objects defines an uncertain spatial database
bull Combinations of object instances define possible database instances calledPossible Worlds
bull Assumption The probability of a possible world can be computed efficiently
Possible World Semantics
18
Answering Queries using PWS
Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in
world and zero otherwise
The probability that a query predicate holds on an uncertain database is defined as
Possible World Semantics
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
20
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
21
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
12
Sources of Uncertainty
bull Uncertain Observationsbull Imprecise sensor measurements (eg radio triangulation Wi-Fi
positioning)bull Inconsistent information (eg contradictive sensor data)bull Human errors (eg in crowd-sourcing applications)
From database perspective the position of a mobile object is uncertain
Dataset provided by E Cho S A Myers and J Leskovek Friendship and Mobility User Movement in Location-Based Social Networks SIGKDD 2011
13
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
14
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
15
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
Improve the quality of modern location based applications and of research results in the field
16
Uncertain Spatial Data Models
bull Discrete Models
bull Continuous Models
Possible World Semantics
04
b
17
Possible World Semantics
bull A collection of uncertain spatial objects defines an uncertain spatial database
bull Combinations of object instances define possible database instances calledPossible Worlds
bull Assumption The probability of a possible world can be computed efficiently
Possible World Semantics
18
Answering Queries using PWS
Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in
world and zero otherwise
The probability that a query predicate holds on an uncertain database is defined as
Possible World Semantics
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
20
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
21
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
13
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
14
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
15
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
Improve the quality of modern location based applications and of research results in the field
16
Uncertain Spatial Data Models
bull Discrete Models
bull Continuous Models
Possible World Semantics
04
b
17
Possible World Semantics
bull A collection of uncertain spatial objects defines an uncertain spatial database
bull Combinations of object instances define possible database instances calledPossible Worlds
bull Assumption The probability of a possible world can be computed efficiently
Possible World Semantics
18
Answering Queries using PWS
Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in
world and zero otherwise
The probability that a query predicate holds on an uncertain database is defined as
Possible World Semantics
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
20
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
21
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
14
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
15
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
Improve the quality of modern location based applications and of research results in the field
16
Uncertain Spatial Data Models
bull Discrete Models
bull Continuous Models
Possible World Semantics
04
b
17
Possible World Semantics
bull A collection of uncertain spatial objects defines an uncertain spatial database
bull Combinations of object instances define possible database instances calledPossible Worlds
bull Assumption The probability of a possible world can be computed efficiently
Possible World Semantics
18
Answering Queries using PWS
Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in
world and zero otherwise
The probability that a query predicate holds on an uncertain database is defined as
Possible World Semantics
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
20
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
21
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
15
Research Challenge
Include the uncertainty which is inherent in spatial and spatio-temporal data directly in the querying and mining process
Assess the reliability of similarity search and data mining results enhancing the underlying decision-making process
Improve the quality of modern location based applications and of research results in the field
16
Uncertain Spatial Data Models
bull Discrete Models
bull Continuous Models
Possible World Semantics
04
b
17
Possible World Semantics
bull A collection of uncertain spatial objects defines an uncertain spatial database
bull Combinations of object instances define possible database instances calledPossible Worlds
bull Assumption The probability of a possible world can be computed efficiently
Possible World Semantics
18
Answering Queries using PWS
Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in
world and zero otherwise
The probability that a query predicate holds on an uncertain database is defined as
Possible World Semantics
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
20
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
21
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
16
Uncertain Spatial Data Models
bull Discrete Models
bull Continuous Models
Possible World Semantics
04
b
17
Possible World Semantics
bull A collection of uncertain spatial objects defines an uncertain spatial database
bull Combinations of object instances define possible database instances calledPossible Worlds
bull Assumption The probability of a possible world can be computed efficiently
Possible World Semantics
18
Answering Queries using PWS
Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in
world and zero otherwise
The probability that a query predicate holds on an uncertain database is defined as
Possible World Semantics
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
20
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
21
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
17
Possible World Semantics
bull A collection of uncertain spatial objects defines an uncertain spatial database
bull Combinations of object instances define possible database instances calledPossible Worlds
bull Assumption The probability of a possible world can be computed efficiently
Possible World Semantics
18
Answering Queries using PWS
Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in
world and zero otherwise
The probability that a query predicate holds on an uncertain database is defined as
Possible World Semantics
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
20
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
21
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
18
Answering Queries using PWS
Let bull be an uncertain database having possible worlds bull be the set of possible worlds of bull be a query predicate bull be an indicator function returning one if predicate holds in
world and zero otherwise
The probability that a query predicate holds on an uncertain database is defined as
Possible World Semantics
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
20
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
21
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
20
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
21
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
20
Possible Worlds Example II
A B
G
D E FC
H I J K
L
M
N OP
Q
R
U
T
S
V W X Y Z
21
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
21
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
22
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
23
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
24
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
25
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
26
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
27
Too many possible worlds
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
28
Querying Uncertain Data Complexity
Naive Query Processing is exponential in the number of objects
Are there efficient solutions to query uncertain spatial data
In general No
ldquoThe problem of answering queries on a probabilistic database D is -complete in the size of Dldquo[DalviSuciu04]
Can be reduced to uncertain spatial databases
But Specific queries may have polynomial time solutions[DalviSuciu04] Dalvi N N and Suciu D Efficient query evaluation on probabilistic databasesIn Proceedings of the 30th International Conference on Very Large Data Bases (VLDB)(2004)
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
29
q
Querying Uncertain Data Running Example
Return the number of objects located in the depicted circular region centered at query point q
This number is a random variable
Total number of possible worlds
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
30
Data Cleaning Aggregation
q
C D
H
I
Ignore Uncertainty (Data Cleaning)
Replace uncertain objects by a deterministic ldquobest guessrdquo
Expected Positions
Most-likely Positions
hellip
Query results are not reliable
Query results may be biased
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
31
Equivalent Worlds An intuition
q
Observation 1
For any possible world and any possible world derived from by changing the position of object the following equivalence holds
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
32
Equivalent Worlds An intuition
q
Observation 1 allows to discard objects outside ofacute the query region
Remaining number of equivalent classes of possible worlds
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
33
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
q
C D
H
I
Querying Uncertain Spatial Data
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
34
Equivalent Worlds An intuition
Observation 2
For each remaining object we only need to consider the predicate ldquoinsiderdquo
Remaining number of equivalent classes of possible worlds
q
C D
H
I
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
35
Equivalent Worlds An intuition
Observation 3
We only require the number of objects in the query regionInformation about concrete results objects can be discarded
q
C D
H
I
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
36
q
08 02
04
H
C A
B
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
37
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
38
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
39
q
08 02
04
H
x x
x
H3
Generating Functions
Main idea Use polyomial multiplication to enumeratepossible results
Observation 3 Anonymize Objects - Substitute ABC by x
Each monomial implies that the probability of having exactly results equals
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
40
For each object let Consider the following generating function [2]
in the expanded polynomial the coefficient of monomial equals the probability that exactly objects are inside the query region
Generating Functions Formally
[2] Jian Li Barna Saha and Amol Deshpande A Unified Approach to Ranking in Probabilistic Databases PVLDB 2(1) 502-513 (2009)
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
41
Example
q
08 02
04
H
C A
B
H3
Count Queries on Uncertain Data
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
42
q
08 02
04
H
C A
B
H3
Example =
Count Queries on Uncertain Data
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
43
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
44
q
08 02
04
H
C A
B
H3
Example = =
Count Queries on Uncertain Data
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
45
The Paradigm of Equivalent Worlds
A query predicate and an uncertain database DB we can answer on DB in PTIME if the following three conditions are satisfied
I A traditional query on certain data can be answered in polynomial time
II We can identify a partitioning of all possible worlds into classes of equivalent worlds such that the number of classes is polynomial in |DB|
III The probability of a class can be computed in polynomial time
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
46
The Paradigm of Equivalent Worlds
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
47
Approximated Results Sampling
Materialize a set S of possible worlds
Samples drawn independent and unbiased
Evaluated the query predicate on each world
Distribution of sampled results is an unbiased approximation of the true distribution of results
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
48
q
08 02
04
H
C A
B
H3
Sampling Example
Drawing 100 possible worlds may yield the followingestimators
Compare to the exact probabilities
No indication of reliability or confidence of estimations
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
49
q
08 02
04
H
C A
B
H3
Sampling Confidences
Drawing 100 possible worlds may yield the followingestimators
Use statistical methods to assess the quality of estimators
Eg Wald-Test
Where is the percentile of the standard normal distribution
At a significance level of the true probability is in the interval [0442 0638]
True probability
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
50
Uncertain Spatial Data Management Summary
Motivation Flood of geo-spatial data Enriched with additional contexts (text social multimedia) Inherent uncertainty
Data Cleaning ldquoBest guessrdquo answers Unreliable results Biased results
Paradigm of Equivalent Worlds Efficient solution for the most prominent types of spatial queries Example Generating Functions
Approximations Monte-Carlo sampling Probabilistic guarantees
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
51
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
52
bull Modelsbull Motionbull Location Uncertainty
bull Queriesbull Part 1 NN (free-space motion)
bull Semantics and Processing
bull Part 2 Range (road-networks constraints)bull Semantics and Processing
bull Uncertainty ndash the flip-side
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
53
Model of a trajectory
Y
X
Time
Present time
2d-ROUTE
3d-TRAJECTORY
Approximation does not capture accelerationdeceleration
Point Objects NOTE the model needs to be augmented for objects with extent eg hurricanes
Bart Kuijpers Harvey J Miller Walied Othman Kinetic space-time prisms GIS 2011
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
54
Mobility Models
sequence of (locationtime)updates (eg GPS-based)
Electronic maps + traffic distributionpatterns + set of (to be visited) point=gt full trajectory
(locationtime Velocity Vector )updates
now -gt
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
55
Modeling UncertaintyLocation Uncertainty Models
Various Sourcesrsquo Imprecision(GPS Sensors)
Quest for data reduction
Ralph Lange Harald Weinschrott Lars Geiger Andreacute Blessing Frank Duumlrr Kurt Rothermel Hinrich Schuumltze On a Generic Uncertainty Model for Position Information QuaCon 2009
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
56
Modeling Uncertain Trajectories
bull Model 1Cones Beads Necklaces (AKA space-time prisms)
bull Assumed exact location samplesbull Unknown (but bounded) maximum speed in-betweensamples
Reynold Cheng Sunil Prabhakar Dmitri V Kalashnikov Querying Imprecise Data in Moving Object Environments ICDE 2003G Trajcevski A Choudhary O Wolfson G Li Uncertan Range Queries for Necklases MDM 2010
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
57
Modeling Uncertain TrajectoriesModel 2
Constraints
Motion along road network
Uncertainty restricted to Road Segments
p1 p2
q
t2t1
t
Bart Kuijpers Walied Othman Modeling uncertainty of moving objects on road networks via space-time prisms International Journal of Geographical Information Science 23(9) (2009)
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
58
Modeling Uncertain TrajectoriesModel 3Sheared Cylinders
Constant bound on location-error at anytime-instant
G Trajcevski O Wolfson K Hinrichs S Chamberlain Managing Moving Objects Databases with Ucertainty ACM TODS 2004
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
59
Processing Spatio-Temporal Queries for Uncertain Trajectories
Need to jointly considerSyntax- The impact of uncertainty on the (structure of the) answer- The impact of constraints
Processing Algorithms- Impact of the motion+uncertainty coupling on filteringprunningrefinement
eg possible routes to be taken
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
6060
Example inside range (disk) around ldquoQrdquo between t1 and t2
1 What is the probability of a path taken
2 What is the earliestlatest arrival at a given vertex (consequently along edge)
Kai Zheng Goce Trajcevski Xiaofang Zhou Peter Scheuermann Probabilistic range queries for uncertain trajectories on road networks EDBT 2011
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
61
Continuous NN for trajectories
Q_nn(q) Retrieve the nearest neighbor of the moving object whose trajectory is Trq between [tbte]
A-nn(q) [(Tri1 [tbt1]) (Tri2[t1t2]) hellip (Trim[tm-1te])]
T = tb
T = te
X
Y
T = t1
Trq Tr1 Tr2Tr3
Given a collection of moving objects trajectories Tr1 Tr2 hellip TrN
Example assuming that Trq is the querying trajectory between [tbte] then- Tr1 (black) is the NN throughout [tbt1]- Tr2 (red) is the NN throughout [t1te]
Observation The answer to the query is time-paremeterized
Yufei Tao Dimitris Papadias Qiongmao Shen Continuous Nearest Neighbor Search VLDB 2002
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
62
Impact of UncertaintyGivenTr1 Tr2 hellip TrN where each Tri has some location uncertainty associated with its location at each time-instant consider the variant
UQ_nn(q) Retrieve the all the objects whose trajectories have a non-zero probability of being a nearest neighbor of the moving object whose trajectory is Trq between [tbte]
T = te
X
YT = tb
T = t1
Trq Tr1 Tr2Tr3
T = tb1
T = t11
Observations
adding uncertainty to the previous example implies-Tr1 (blue) is the exclusive non-zero NN-probability up to tb1-Starting at tb1 Tr3 (green) also has a non-zero NN-probability-Specifically at t11 all three trajectories have a non-zero NN-probability
= what should the structure ofthe answer to UQ_nn(q) look like
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
63
Structure of the Answer to Probabilistic NN-query for Trajectories
We propose the IPAC-NN (Interval-basedProbabilistic Answer to a Continuous NN query) tree
Tri11 tb t11 D11
Tri12l t12(l-1) t12 D12k12
[TrQ tb te]
Tri1 tb t1 D1 Tri2 t1 t2 D2 Trim t(m-1) te Dm
Tri12 t11 t12 D12 Tri1k1 t1(k-1) t11 D1k1 Trik1 t(k-1) t21 Dm1 Trim1 t(m1-1) te Dmkm
Tri121 t11 t121 D121
Root parameters of the query-Querying trajectory Trq - Time-interval of interest [tbte]
Children-if parent excluded the trajectories that have the highest probability of being NN to Trq within a sub-interval of the interval bounded by the parent
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
64
Instantaneous (ie spatial) NN-query when the querying object is crisp (ie no uncertainty)
Rmin gt Rmax
Rmin
Rmax
Q
Tr1
Tr2
Tr3
Tr4
Tr5
Rd1
1
14
Trq = 2D point Q The rest are possible-locations bounded by circles
Observation -Any object with min-distance from Q being gt Rmax has a ldquo0rdquo probability of being a NN to Q-Example R4 cannot have a non-zero probability of being Qrsquos NN
The probability that the location of Tri is within distance Rd from Q is given by
The probability that a given object Trj is the NN of Q is given by
ie Trj is the NN if itrsquos at distance Rd AND all the rest are at distances gt Rd and integrating over all possible Rdrsquos (NOTE the upper-bound is Rmaxhellip)
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
65
When Trq (the instantaneous location) has an uncertainty associated with it the previousobservations are not directly applicablehellip Example
Rmin
Rmax
TrQ
Tr1
Tr2
Tr3
Tr4
Tr5
Z1
Z2
dist(Z1Z2) lt Rmax
We can no longer ldquoprunerdquo Tr4 from consideration
The main problem now becomes how to calculate the probability of an object say Trj being within_distance Rd from Trq
NOTE in effect a quadruple-integration instead of the double-one (cf previous slide)hellip
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
66
1
x
y
0
2r
1r2
pdf(Tr2)
pdf(Trq) pdf(Tr1)
Example assuming a uniform pdf of possible-locations but two of the points to be used when evaluating the actual probability of Tr1 being within_distance Rd from Trq
Main ObservationLet Vi and Vq denote the 2D random variables representing the possible locations of Tri and TrqIn effect we are looking for
DefineViq = Vi ndash Vq (cross-correlation) as another random variableViq = Vi + (-Vq)which due to the independence implies that the pdf ov Viq is a convolution of the pdfs of Vi and ndashVq
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
67
1
x
y
0-40
+40
4r
34r2
pdf(Tr2 - Trq)
pdf(Tr1 - Trq)
Let Viq = Vi + (-Vq)
Property 1If pdf(Vi) has a centroid Ci coinciding with E(Vi)AND pdf(Vq) has a centroid Cq coinciding with E(Vq)
THEN their convolution pdf(Viq) has a centroid Ciq = Ci + (-Cq) coinciding with E(Viq)
Property 2Assume that pdf(Vi) and pdf(Vq) have rotational symmetry around their centroids and with respect to the vertical axis (the value of the pdfrsquos)THEN pdf(Viq) is also rotationally-symmetric around its centroid Ciq
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
68
The continuous aspect of the probabilistic NN queries
Let Vxiq = vxi ndash vxq and Vyiq = vyi ndash vyq denote the corresponding components of the velocity of the object whose expected location is along TRiq
Then the distance of the expected location along TRiq from the origin at a function of t is
hyperbola since A gt 0
Now we have a collection of such distance functions (for each i)
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
69
The continuous aspect of the probabilistic NN queries
Consequence The problem of constructing the IPAC-NN tree can be reduced to the problemof finding the collection of ranked lower envelopes for a set of distance-functionsSdf = d1q(t) d2q(t)hellipdNq(t) (NOTE excluding dqq(t) 0)
Observation Two distance functions diq(t) and djq(t) throughout the time-interval of interest for the query [tbte] can intersect at most twice (NOTE set diq(t) = djq(t))Call such pair-wise intersections critical time-points
dist
TR1
TR2
LE12
tb t11 t12
Example
Lower envelope of two distance-trajectories
critical time-point
NOTE time-dimension on horizontal axis
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
70
The continuous aspect of the probabilistic NN queries
Q how to efficiently construct the lower envelope of the whole collection of distance-functions
A divide-and-conquer approach in the spirit of Merge-Sortdist
TR1
TR2
LE12
tb t11 t12
dist
TR3
TR4
LE34
tb tet31dist
TR1
TR2
LE1234
tb tet11 t12
TR3
TR4
t31
t1new t2new
Complexity of the lower envelope constructionO(N logN) (N = number of trajectories)
Now how about the IPAC-NN tree
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
71
The lower envelope provides a continuous-pruning criteria ie any trajectory whosedistance function is further than 4r (r = radius of the uncertainty disk) can not have a non-zero probability of being NN to Trq (eg Tr7 in the Figure and Tr6 initially)
Observation 3 IF the lower envelope is removed from the (distance-functions time) spaceTHEN the lower envelope of the leftovers corresponds to the ldquograndchildrenrdquo of the root of the IPAC-NN tree
dist dist
lower envelope
TR1TR2
TR3
TR4
TR5
TR6
tb te
2nd-lower-envelope(nodes in the Level2of the IPAC-NN tree )
4r
TR7
t1 t2 t3
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
72
Given two trajectory samples (ti pi) and (ti+1 pi+1) of a moving object a on road network the set of possible paths (PPi) between ti and ti+1 consists of all the paths along the routes(sequence of edges) that connect pi and pi+1 and whose minimum time costs are not greater than ti+1 - ti ie
Example if pdf of selecting a possible path is uniform
Given a path Pj PPi(a) the Possible Locations of a given moving object a with respect to Pj at t [ti ti+1] is the set of all the positions p Pj from which a can reach pi (respectively pi+1) within time period t - ti (respectively ti+1 - t)ie
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
73
Combine the two probabilitieshellip
Possible pathsPossible locations when t=4
a
m
Probability of falling inside segment mp1
0505 = 025 (at t = 4)
Probability of an object a falling inside some segment S at given time instant t
)(
))(Pr()Pr())(Pr(tPPp
p
a
tPLSpSta eg uniform pdf
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
74
ConstructionRepresentationrsaquo Given location-in-time samples to obtain an uncertain trajectory
representation we needndash The collection of all the possible paths (and probabilities)
ndash Probabilistic Location FunctionExample observe the two possiblesequences of going from position pi
to position pi+1
At the time-instant t = 4 the object can
be in certain segments either along the edge v1v5 or along the edge(s) v1v3 (+ v3v4 )
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
75
Processing
rsaquo Data Structuresndash Edge hash-table
rsaquo Small in-memoryndash Movement R-tree
rsaquo 1D for each edgendash Trajectories List
rsaquo Store samples ordered by time-stamp
rsaquo Assign pointer to set of possible paths
rsaquo Earliest-arriving and Latest-departure stored for vertices
rsaquo On-disk retrieve on-demand
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
76
rsaquo Network expansionndash Expansion tree a tree rooted in q and contains the edges
whose network distances with q are smaller than r
rsaquo Filteringndash Fetch the movement R-trees of the edges in expansion treendash Search leaf entries whose maximum time interval intersect
with tq
ndash The objects pointed by those leaf entries are candidates
rsaquo Refinementndash The candidate set is considerably smaller than original
trajectory datasetndash Calculate the qualification probability for each candidate
AB
C
D E
q
earliestlatest times of arrival possible path but definitely not part of the
answer to the range query
expansion tree ldquobubblingrdquoalong road-network segments
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
77
Temporal-continuous queryndash Only calculate the QPs for a few critical time instants the
earliest arriving time and latest departure time at each vertex along the possible path
ndash Easy to prove that the actual QP is monotonic in-between consecutive critical time instants
ndash The QPs critical time instants define an envelop function for the actual QPs
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
78
Uncertainty ndash the flip-side
rsaquo Data reductionndash Why transmitting every single location point
rsaquo Bandwidthrsaquo Energy
rsaquo Adapt spatialmap approaches Define acceptable error Save on data-size
John Hershberger Jack Snoeyink ldquoSpeeding up Douglas-Peucker Line-simplification Algorithmrdquo SSHD 1997
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
79
Uncertainty ndash the other-flip-sidersaquo What are the important properties not to be distorted with reduction
ndash Topological properties (eg two non-intersecting regions should not intersect after reduction)
rsaquo What is the impactrole of time in the picturendash Is it not ldquoZrdquo axishellip cannot travel back in timendash How long should the sink data to be ldquoun-freshrdquo
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
80
Outline
rsaquo Introduction
rsaquo Uncertain Spatial Data
rsaquo Uncertain Spatio-Temporal Data (Geometric Approach)
rsaquo Uncertain Spatio-Temporal Data(Probabilistic Approach)
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
81
So far hellip
Stochastic methods for uncertain spatial data
Probabilisitc results
Time dimension
Geometric models for uncertain
spatio-temporal data
Probabilisitc results
Time dimension
vs
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
82
Merge the approaches
rsaquo Ideandash Discretize the timendash For each time a pdf (pmf) over the possible position is
given
rsaquo Examplesndash Nearest Neighbour Queries on uncertain moving objectsndash Similarity Search on uncertain time series
rsaquo Problem Independency assumption prohibits time-parametrized queries
R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
83
A highway examplehellip
t
pos
Q
What is the probability that the car is in some area at least once during some time
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
84
A highway examplehellip
t
pos
Q
vmin vmax
What is the probability that the car is in some area at least once during some time
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
85
A highway examplehellip
t
pos
vmin vmax
Q
Assuming uniform distributionhellip
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
86
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
87
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065
Violation of the speed constraints
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
88
A highway examplehellip
t
pos
50
30
Q
Independence assumption 1 ndash (1-05)(1-03) =
065Consideration of dependency
= 05
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
89
An adequate model
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
90
Stochastic Processesrsaquo Stochastic Processes are used to represent the
evolution of some random value or system over time
rsaquo A sound mathematical model which can be used to describe the uncertain location of an object over time
rsaquo Many Stochastic Processes for different settingsndash Discrete Time + Discrete Space (eg Markov Chain)ndash Discrete Time + Continuous Space (eg Harris Chain)ndash Continuous Time + Discrete Space (eg Markov Process)ndash Continuous Time + Continuous Space (eg Wiener Process)
Doob J L (1953) Stochastic Processes Wiley
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
91
A simple example
rsaquo Whenever the wooden board is hit the ball stays or drops into one of the neighbour holes with certain probabilities
rsaquo At the border of the wood board these probabilities are different
rsaquo This model is usually learned or given by experts
04 0402
06 04
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
92
04 0402
016 016036 016016
012008 012 012 012 012 012 012 008
A simple example
rsaquo Initial Position
rsaquo After first hit
rsaquo After second hit
rsaquo After 40th hit
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
93
How can we model this
rsaquo A Markov Chain is a ldquomemorylessrdquo Stochastic Process (the next state depends only on the current state)
rsaquo For our example we build the following transition Matrix M 04 06 0 0 0 0 0 0 0
04 02 04 0 0 0 0 0 0
0 04 02 04 0 0 0 0 0
0 0 04 02 04 0 0 0 0
0 0 0 04 02 04 0 0 0
0 0 0 0 04 02 04 0 0
0 0 0 0 0 04 02 04 0
0 0 0 0 0 0 04 02 04
0 0 0 0 0 0 0 06 04
from
bu
cket
to bucket
= M
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
94
How can we model this
rsaquo First hit
rsaquo Second hit
rsaquo 40th hit
0 0 1 0 0 0 0 0 0( ) 002
04
02 0 0 0 0 0
)
( ) M40 = (
08 012 012 012 012 012 012 012 008 )
04
06 0 0 0 0 0 0 004 02 04 0 0 0 0 0 00 04 02 04 0 0 0 0 00 0 04 02 04 0 0 0 00 0 0 04 02 04 0 0 00 0 0 0 04 02 04 0 00 0 0 0 0 04 02 04 00 0 0 0 0 0 04 02 040 0 0 0 0 0 0 06 04
= (
0 0 1 0 0 0 0 0 0
( ) M = (
016 016 036 016 016 0 0 0 0 )0
04
02
04 0 0 0 0 0
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
95
Fusion of Model and Reality
rsaquo Discretization of time and spacendash Eg treat intersections as
states and add additional stateson long streets
ndash The time interval correspondingto a tick is eg 20 sec
rsaquo Estimation of model parametersndash Transition probabilities from one state to another are
learned from historical data (very sparse matrix)ndash Transition matrix can change over time and for different
object groups
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
96
Querying the Model
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
97
ST - Window Queries
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
119872=( 0 0 106 0 040 08 02)
s1
s2
s3
10
06
06
02
04
Note We have an exponential number ofpossible paths the car might take
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
98
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
99
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
100
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
101
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 048016036)
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
102
ST - Window Queries
(100)119872=( 0 0 1
06 0 040 08 02)
t=0 t=1 t=2 t=3
rsaquo Given the following state states and transition probabilities what is the probability that the car is in s1 or s2 in the time interval T = [23]
s1
s2
s3
s1
s2
s3
10
06
06
02
04
(001)
( 00802)
( 00016004 )Result = 096
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
rsaquo Solution based on matrix multiplications introduces a new state for the winner trajectories and two matrices
103
ST - Window Queries
119872minus=(0 0 1 0
06 0 04 00 08 02 00 0 0 1
)
(1000)(
0010)(
00
0208
)(00
004096
)119872
+iquest=(0 0 1 00 0 04 060 0 02 0 80 0 0 1
)iquest
s1
s2
s3
s4
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
104
Multiple Observations
rsaquo So far we had only one observationfrom which we could extrapolate
rsaquo This is not really of interest sincecars do not move randomly
rsaquo With two observations we have tointroduce more artificial states andadapt the techniques
loca
tion
spac
e
time spacet0
loca
tion
spac
e
time spacet0 t1
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
105
Multiple Observations
rsaquo We need to track where true hit worlds are locatedndash 2|S| classes of equivalent worldsndash One class Si
- corresponding to worlds where o is located in state si and o has not intersected the window
ndash One class Si+ corresponding to worlds where o is located in
state si and o has not intersected the window
119872=( 0 0 106 0 040 08 02)
t=0 t=1 t=2 t=3
s1
s2
s3
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
106
Multiple Observations
(100000)
t=0 t=1 t=2 t=3
s1
s2
s3
(001000)(
00
020
080
)(0
0 160 04048
0032
)S1
-
S2-
S3-
S1+
S2+
S3+
not∎
∎
119872minus=(119872 00119872 )
119872
+iquest=(000010 000000000004 060000000002 000800000000 00001000000006 0004000000 000802
)iquest
119872=( 0 0 106 0 040 08 02)
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
107
iquest119875 (∎and)119875 ()
119875 (∎|iquest= 119875 (iquest∎)lowast119875 (∎)119875 ()
iquest119875 (∎and)
119875 (and∎ )+119875 (andnot∎)
Bayesrsquo Theorem
(0
0 160 04048
0032
) iquest032
032+004=089
rsaquo Now what is the probability that the trajectory passes the query window given the fact that the object was seen in s3
S1-
S2-
S3-
S1+
S2+
S3+
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
108
Summary
rsaquo Prosndash Allows to answer time-parametrized queries according to
possible worlds semanticsndash Considers location dependencies over timendash Scales up very well since it is purely based on sparse
matrix multiplicationsndash Natively extendable for uncertain observationsndash Seems to work adequately on real-world data (more
validation needed)
rsaquo Consndash Discrete time and spacendash Matching from time to tics might not be the perfect
modelling
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
109
Selected Works
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
110
Indexing UST Data
rsaquo With the above techniques each object in the database has to be processed
rsaquo Index Structure based on R-Tree indexing the ST-Space
rsaquo The leafs contain the ldquointelligencerdquo and enable probabilistic pruning (at max x of the possible trajectories of o may intersect Q)
Poid
time
loca
tion
T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM CIKM Maui Hawaii USA 2012
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
111
KNN queries + Sampling on UST Data
rsaquo Not all queries can be solved as elegant as window queries
rsaquo Popular in uncertain databases Monte-Carlo-Samplingndash Draw a sufficiently high number of
samplesndash Approximate result probability = ratio
of samples that satisfy the query and total number of drawn samples
rsaquo But how to draw samples efficiently such that they are conform with the observations
rsaquo Solution Adaption of transition matricesJohannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
112
rsaquo Application RFID sensors track individuals in an indoor environment
rsaquo Event query is a sequence of subeventsEg Joe got coffeerdquo can be expressed as a sequence of three events (1) Joe is in his office (2) Joe is in a coffee room (3) Joe is back in his office
rsaquo Query language to formulate probabilistic event queries(Lahar) is related to regular expressions
Event queries
Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
113
Summary
rsaquo Models for spatial uncertainty
rsaquo Concepts of effectiently managing uncertain datandash Cleaningndash Approximationndash Paradigm of equivalent worlds
rsaquo Geometric Models can be used ndash when no probabilistic model is presentndash to find possible answers (no confidence)
rsaquo Probabilistic management of UST datandash based on Stochastic Processesndash allows for probabilistic answers
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
114
Thanks for listening
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
115
Related Workrsaquo [Doob53] Doob J L (1953) Stochastic Processes Wiley
rsaquo [EKMRZ12a] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Querying uncertain spatio-temporal data In Proceedings of the 28th International Conference on Data Engineering (ICDE) Washington DC 2012
rsaquo [EKMRZ12b] T Emrich H-P Kriegel N Mamoulis M Renz and A Zuumlfle Indexing uncertain spatio-temporal data In Proceedings of the 21th ACM International Conference on Information and Knowledge Management (CIKM) Maui Hawaii USA 2012
rsaquo [NZERMCK13a] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Probabilistic Nearest Neighbor Queries on Uncertain Moving Object Trajectories PVLDB 7(3) 205-216 (2013)
rsaquo Project Page httpwwwdbsifilmudecmsPublicationsUncertainSpatioTemporal
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
116
Related Workrsaquo [NZERMCK13b] Johannes Niedermayer Andreas Zuumlfle Tobias Emrich Matthias
Renz Nikos Mamoulis Lei Chen Hans-Peter Kriegel Similarity Search on Uncertain Spatio-temporal Data SISAP 2013 43-49
rsaquo [XGCQY13] Chuanfei Xu Yu Gu Lei Chen Jianzhong Qiao Ge Yu Interval reverse nearest neighbor queries on uncertain data with Markov correlations ICDE 2013 170-181
rsaquo [EKMNRZ14] T Emrich H-P Kriegel N Mamoulis J Niedermayer M Renz and A Zuumlfle Reverse-Nearest Neighbor Queries on Uncertain Moving Object Trajectories DASFAA 2014
rsaquo [CKP04] R Cheng D Kalashnikov and S Prabhakar ldquoQuerying imprecise data in moving object environmentsrdquo in IEEE TKDE vol 16 no 9 2004 pp 1112ndash1127
rsaquo [AKKR09] Johannes Aszligfalg Hans-Peter Kriegel Peer Kroumlger Matthias Renz bdquoProbabilistic Similarity Search for Uncertain Time Seriesldquo SSDBM 2009 435-443
rsaquo [RLBS08] Christopher Reacute Julie Letchner Magdalena Balazinska Dan Suciu Event queries on correlated probabilistic streams SIGMOD Conference 2008 715-728
Recommended