Upload
merle
View
40
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Continuous Query Processing on Spatio-Temporal Data Streams. Rimma V. Nehme Department of Computer Sciences, Worcester Polytechnic Institute Thesis Advisor: Elke A. Rundensteiner Thesis Reader: Michael A. Gennert June 27, 2005. Outline. Motivation Part I : SCUBA Motivation - PowerPoint PPT Presentation
Citation preview
Continuous Query Processing on Spatio-Temporal Data Streams
Rimma V. Nehme
Department of Computer Sciences, Worcester Polytechnic Institute
Thesis Advisor: Elke A. RundensteinerThesis Reader: Michael A. Gennert
June 27, 2005
2
Outline Motivation Part I: SCUBA
Motivation Moving Clusters SCUBA Algorithm Analysis of SCUBA Evaluation Conclusions Future Work
Part II: Performance vs. Accuracy Discrete vs. Continuous Model Accuracy Model Evaluation Conclusions
Are we
there yet?
3Motivation
Monitor the traffic in the
red areas
Continuously return the
area covered by the heard during the migration
Send a notification to all cell phone users in the range of 2 miles
that we have 50% off lunch sale
4
Challenges Scalability
Large number of objects Large number of queries
Limited Resources Memory CPU
Real-time Response Requirement
Reduce the number of computations
The challenge is to provide fast query response in update-intensive environments
- moving objects- dynamic range query
- dynamic kNN query
Novel Idea: Exploit thefact that objects/queries
move in groups (i.e., clusters)to optimize the execution
5
Big Picture
SINA [MXA04] SEA-CNN [XMA05] Q-Index [PXK+02]
SR [SR01] DQ [LPM02] CNN [TPS] TPR [SJL00]
Traditional Execution Shared Execution
My work (SCUBA)
Shared Cluster-Based Execution
Use clustering as means to improve execution for densely moving objects and queries
6
Proposed SolutionMoving Clusters!!!
Main Idea: Abstracting individual entities into a cluster based on common attributes
-Direction
-Speed
-Spatial Position
The execution of continuous moving queries on moving objects is then abstracted as the join-between moving clusters and join-within moving clusters
Continuously retrieve closest police car next
to me
Police Car
Scalable Cluster-Based Algorithm for Evaluating Continuous Spatio-Temporal Queries on Moving Objects (SCUBA)
7
Architecture Overview SCUBA-enabled motion operator
execution
SCUBA - Motion Operator
Moving ObjectsData Stream
Moving QueriesData Stream
Results DataStream
-range query
Time interval expires
Grid-based Join Between/Within
Clusters
I present the system in the context of continuous spatio-temporal range queries
StreamGenerator
Query PlanGenerator
StreamGenerator
Raindrop Workhorse
ExecutionEngine
ExecutionScheduler
StatisticsGatherer
StreamReceiver
CAPE Engine
User Query
Control Flow
Data Flow
Legend:
User Query
End User
Internet
CAPE
-moving object
Moving Clusters
8
Moving Cluster Representation in SCUBA
Centroid
Actual Cluster SizeΘD
Max Cluster Size
Velocity Vector
Cluster members:-moving objects
Cluster members:-moving queries
Cluster Member Representation Inside Cluster:
Centroid
Cluster member:(moving object)
9
SCUBA Execution SCUBA produces result every time units
Phase I: Cluster Pre-Join Maintenance Formation of new clusters Dissolving “empty” clusters Expanding existing clusters
Cluster-Based Joining
Clusters Position Update
Send Results
…
DONE
DONE
SCUBA has three phases
Phase II: Cluster-Based Joining
Phase III: Cluster Post-Join Maintenance Dissolving “expiring” clusters Relocating “non-expiring” clusters based on velocity vector in the grid
TimeoutIn-memory
clustering
Ob
ject
&
Qu
erie
s
DONE
Cluster Pre-Join Maintenance
Cluster-Based Joining
Cluster Post-Join Maintenance
10
Phase I: Cluster Pre-Join Maintenance Clustering is done incrementally (upon the arrival of updates) Location update format
(ID, Loct, t, Speed, CNLoc, ...)
Use 2 threshold distances + destination ΘD – distance threshold
ΘS – speed threshold Destination
ConnectionNode (CNLoc)
Clustering Algorithm is based on Leader-Follower Clustering Algorithm(J.A. Hartigan. Clustering Algorithms,John Wiley and Sons 1975)
(1) New moving object arrives
(2) Hash objectinto the grid
(3) Add object to thecluster and update cluster attributes
M1
M2
M3
M1
M2
M3
-centroid position-radius-average speed-member count
Parent Cluster
(4) If the cluster has expanded check foroverlap with neighboringcells (make new entries if necessary)
Clustering New Object Example
(5) If object left the existing cluster,for a new cluster and the old cluster is “empty”, dissolve the old cluster.
11
Phase II: Cluster-Based JoiningLocation updates
arrive
Incremental Clustering Cluster-Based Join
∆ expires
Join-Between
= overlap
ignored
= query results
Join-Within
Phase I Phase II
12
Phase II: Cluster-Based Joining (cont.) Join-Between
Between two clusters
Join-Within For each cluster (joining objects and queries inside) For two overlapping clusters (cross-join between objects and queries
from the two clusters)
Join-Between
= overlap
Join-Within
ignored
= query results
Join-Within
13
Phase III: Cluster Post-Join Maintenance
ConnectionNode
Dissolved
New ClusterPosition Updated
Insert into the grid
Clear the grid
Dissolve “expiring” clusters
Relocate “non-expiring” clusters based on velocity vector back into the grid
14
Data Structures Objects Table Queries Table ClusterHome Table ClusterStorage Table ClusterGrid
1
246
5637
42
15
Moving Cluster-Based Load Shedding Focus: Discarding data inside moving clusters
ΘD
Velocity Vector
O1(r1,1)
O2(r2,2)
O3(r3,3)
Q4(r4,4)
Q5(r5,5)
Case 1: No Load Shedding (All relative positions of cluster members are preserved)
16
Moving Cluster-Based Load Shedding (cont.)
ΘD
Velocity Vector
Cluster Members:(O1,O2,O3, Q4,Q5)
Case 2: Full Shedding (All relative positions of cluster members are discarded)-Cluster is the sole representation of movements of its members-Assume all objects satisfy all queries inside the cluster-No Join-Within is needed
17
Moving Cluster-Based Load Shedding (cont.)
Case 3: Partial Shedding (Some (furthest) relative positions of cluster members are maintained)- Introduce new structure to abstract discarded members - Nucleus-Assume all objects satisfy all queries inside the nucleus-No Join-Within is needed for cluster nucleus members
ΘD
Velocity Vector
O2(r2,2)
ΘN
ΘN = 0.45 * ΘD
Nucleus
Nucleus Threshold
18
Experimental Settings We use the Network-based Generator of
Moving Objects to generate a set of moving objects and moving queries in Worcester County (Tiger Line files)
Unless mentioned otherwise, the following are the parameters used:
10,000 moving objects and 10,000 moving queries. Each moving object or query reports its new information (if changed) every time unit.
The percentage of objects and queries that report a change of information is 100%
Speed of objects and queries is set to medium
ΘD = 100 (spatial units), ΘS = 10 (spatial units/time units) ΘN = 0 (no load shedding)
Grid: 100x100
19
Experimental Results Varying Grid Cell Sizes
- Performance of regular grid-based execution improves with finer granularity of grid cells (But memory requirements increase as well)
0
10
20
30
40
50
60
50x50 75x75 100x100 125x125 150x150
REGULAR SCUBA
0
500
1000
1500
2000
50x50 75x75 100x100 125x125 150x150
REGULAR SCUBA
Tim
e (in
sec
s)
(a) Join TimeGrid Cell Count Grid Cell Count
Mem
ory
(in M
B)
(b) Memory Consumption
20
0
5
10
15
20
25 Offline Clustering Time
Join Time
Tim
e (in
sec
s)
Increm. Non-Inc.iter = 1
Non-Inc.iter = 3
Non-Inc.iter = 5
Non-Inc.iter = 10
- Non-Increm. Clustering Time- Join Time
Experimental Results (cont.) Varying Skew Factor:
Incremental vs. Non-incremental:
-The higher the skew factor the more dense the objects and queries (i.e., more clusterable)
-EXPERIMENTS TO FINISH
-Join time slightly improves with non-incremental clustering-But the clustering wait time outweighs the advantage of faster join
21
Experimental Results (cont.) Moving Cluster-Based Load Shedding:
- Varying ΘN relative to the ΘD
- Accuracy measured in terms of false positives and false negatives- Measure average # of FP and FN (per object and query)
22
Experimental Results (cont.) Cluster Maintenance:
Cluster maintenance time is cheap relative to the join time
-EXPERIMENTS TO FINISH
23
Contributions
I proposed:
SCUBA is a novel cluster-based algorithm for continuously evaluating a set of concurrent continuous spatio-temporal queries. SCUBA is generic model that is applicable to any location-aware server.
Scalability in SCUBA is achieved through shared cluster-based execution, where objects and queries with similar attributes are grouped into clusters. The execution of a set of concurrent continuous queries is abstracted as a join-between and join-within moving clusters.
SCUBA utilizes moving cluster-based load shedding, with two alternatives (full shedding, partial shedding of cluster members) to resource usage while maintaining accurate answers.
Experimental results show that SCUBA outperforms regular grid-based indexing scheme when executing on densely moving objects
24
Future Work Non-circular clusters Extend to other types of spatio-temporal queries
CKNN Aggregate
Hierarchical clustering (merge and break-down clusters) Use real-sensor data
25
Part II: Additional Work
Accuracy vs. Performance Tradeoffin Location-Aware Services
26
Part II: Accuracy vs. Performance Tradeoff
Motion can be described as
(a) A list of discrete positions (b) A continuous function
time time
27
Related Works: Discrete & Continuous Discrete:
mSTOMM [SDK02] MobiEyes[GL04] SINA [MXA04] SEA-CNN [XMA05] Q-Index [PXK+02]
Continuous: DOMINO [WCL02] A Framework for Representing
Moving Objects [BBH04] MON-Tree [AG04] CHOROCHRONOS/TB-tree [PJT00] Continuous Nearest Neighbor Search [TPS02] Dynamic Queries [LPM02]
Discrete: Faster Simpler computations (join) Smaller memory req-s Poor approximation of actual movement Poor accuracy, especially with infrequent
updates or when objects move fast Don’t know anything about the object
between the updates Load shedding has dramatic effect on
accuracy
Continuous: Slower More complex computations (join) Larger memory req-s Better approximation of actual movement Higher accuracy Can answer questions about durations of
events Can do load shedding with relatively
good quality answers
I investigate when each model is more appropriate for any
location-aware server
28
Linear Continuous Model Use linear segments to approximate the movement between updates Common justifications:
Simple Arbitrarily complex movements can be approximated using piece-wise linear movements. Movement is constrained within a road network (roads tend to be linear)
Other functions describing motion can be plugged into the system
Chicago Washington, DC Los-Angeles
29
Accuracy vs. Performance Tradeoff Continuous Model MORE ACCURATE, but is MORE EXPENSIVE
Accuracy model comparison between discrete and continuous results
Assumptions Continuous model is more accurate (100% accuracy) Compare discrete to continuous
Idea Construct continuous segments out of discrete answers Compare them to continuous results
30
Accuracy Model
Step1: Calculate Average Result Segment Length
Step 2: Multiply average result segment length by the number of discrete results
Step3: Calculate accuracy
According to our model, discrete is ~30% as
accurate as continuous
Step 1: Calculate Average Result Segment Length
31
Accuracy ExamplesScenario 1: Object location update received everytime object entered, stayed, and left the query
Scenario 2: Object location received only once when object was inside the query
Scenario 3: No location update received at any point
in time when object was inside the query
Accuracy ≈ 100%Accuracy ≈ 50%Accuracy ≈ 0%
32
Experimental Results We compare the performance of two models:
Varying the speed of the objects and queries Varying the update probability of objects and queries
We use the Network-based Generator of Moving Objects to generate a set of moving objects and moving queries in Worcester County (Tiger Line files)
5,000 moving objects and 5,000 moving queries. Each moving object or query reports its new information (if changed) every time unit.
Results are computed every 2 time units. Unless mentioned otherwise, the percentage of objects and queries that report a change of information is 100%
33
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
Very Slow Slow Medium Slow Medium Fast Fast Very Fast
0
20
40
60
80
100
120
Per
cent (%
)
DM Join Time CM Join TimeDM Accuracy CM Accuracy
Accuracy and Performance (Varying Speed)
Accuracy
6.4317.59
52.07
80.13
97.0890.62
0
20
40
60
80
100
120
Speed_250 Speed_150 Speed_100 Speed_50 Speed_20 Speed_1
Perc
ent
Continuous Model Discrete Model
Join Time
3086 3276 3480 3903 4069 42824696 4878 5371 8513
33782
20032
05000
100001500020000250003000035000400004500050000
Speed_250 Speed_150 Speed_100 Speed_50 Speed_20 Speed_1
Tim
e (in
mse
cs) Discrete Model Continuous Model
Very Slow Very Fast
Very Slow Very Fast
34
Accuracy vs. Scalability (Varying Update Probability)
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Evaluation Intervals
Tim
e (in m
secs
)
Continuous 100% Continuous 90%
Continuous 75% Continuous 50%
Continuous 25% Discrete 100%
0
1000
2000
3000
4000
5000
6000
7000
8000
Continuous100%
Continuous90%
Discrete 100% Continuous75%
Continuous50%
Continuous25%
Tim
e (im
mse
cs)
0
10
20
30
40
50
60
70
80
90
100
Per
cent (%
)
Average Join Time
Average Accuracy
0
1000
2000
3000
4000
5000
6000
7000
8000
100% 90% 75% 50% 25%
Tim
e (in m
secs
)
0
20
40
60
80
100
120
Per
cent (%
)
Average Join TimeAverage Accuracy
Update Probability = frequency ofupdates from objects and queries
100% = every timestamp50% = every other timestamp
35
Conclusions Continuous model is more preferred when:
1. objects move fast
2. not all location updates are received (e.g., load shedding occurs);
3. location updates arrive out-of-sync due to network delay (in this case, we assume the system would load shed this data, as it is outside of the current window of execution).
Discrete model is preferred when:
1. objects move slow or
2. very frequent location updates occur
Continuous model can give a higher accuracy with better performance with only 75% of location updates.
Next Step: Dynamic switch between location modeling techniques based on: attributes of the arriving data and performance and accuracy requirements
36
References[SDK02] D. Stojanovi´c and S. Djordjevi´c–Kajan: Location–based Web services for tracking and visual route analysis of mobile objects. In: Proceedings of Yu INFO Conference, Kopaonik, 2002, CD ROM (Serbian).
[GL04] Gedik, B., Liu, L. MobiEyes: Distributed Processing of Continuously Moving Queries on Moving Objects in a Mobile System. EDBT, 2004.
[MXA04] Mokbel, M., Xiong, X., Aref, W. SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases. SIGMOD, 2004.
[PXK+02] Prabhakar, S., Xia, Y., Kalashnikov, D., Aref, W., Hambrusch, S. Query Indexing and Velocity Constrained Indexing: Scalable Techniques for Continuous Queries on Moving Objects. IEEE Transactions on Computers, 51(10): 1124-1140, 2002.
[XMA05] Xiong, X., Mokbel, M., Aref, W. SEA-CNN: Scalable Processing of Continuous K-Nearest Neighbor Queries in Spatio-temporal Databases. ICDE, 2005.
[WCL02] Ouri Wolfson, Hu Cao, Hai Lin, Goce Trajcevski, Fengli Zhang, Naphtali Rishe: Management of Dynamic Location Information in DOMINO. EDBT 2002: 769-771
[BBH04] L. Becker, H. Blunck, K. Hinrichs, J. Vahrenhold: A Framework for Representing Moving Objects. Proceedings of the 14th International Conference on Database and Expert Systems Applications (DEXA 2004) Berlin, 2004, 854 - 863
[AG04] V. T. Almeida and R. H. Guting. Indexing the trajectories of moving objects in networks. Technical Report 309, FernuniversitÄat Hagen, Fachbereich Informatik, 2004.
[PJT00] D. Pfoser, C. S. Jensen, and Y. Theodoridis. Novel approaches to the indexing of moving object trajectories. In Proceedings of the 26th International Conference on Very Large Databases, pages 395–406, 2000.
[TPS02] Yufei Tao, Dimitris Papadias, and Qiongmao Shen. Continuous Nearest Neighbor Search. In VLDB, 2002.
[LPM02] Iosif Lazaridis, Kriengkrai Porkaew, and Sharad Mehrotra. Dynamic Queries over Mobile Objects. In EDBT, 2002
[SR01] Zhexuan Song and Nick Roussopoulos. K-Nearest Neighbor Search for Moving Query Point. In SSTD, 2001.
[LPM02] Iosif Lazaridis, Kriengkrai Porkaew, and Sharad Mehrotra. Dynamic Queries over Mobile Objects. In EDBT, 2002.
[TPS] Yufei Tao, Dimitris Papadias, and Qiongmao Shen. Continuous Nearest Neighbor Search. In VLDB, 2002.
[SJL00] Simonas Saltenis, Christian S. Jensen, Scott T. Leutenegger, and Mario A. Lopez. Indexing the Positions of Continuously Moving Objects. In SIGMOD, 2000.
37
Acknowledgments Elke A. Rundensteiner DSRG Michael Gennert George Heineman Thomas Brinkhoff
38
Thank You
The End