Upload
willis-payne
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
SCUBA: Scalable Cluster-Based Algorithm for Evaluating Spatio-Temporal Queries On Moving Objects
1
Rimma V. Nehme
Department of Computer Sciences, Purdue University,
W.Lafayette, IN 47906 [email protected]
Elke A. Rundensteiner
Department of Computer Sciences, Worcester Polytechnic Institute,
Worcester, MA 01609 [email protected]
SCUBA: Scalable Cluster-Based Algorithm for Evaluating Continuous Spatio-Temporal Queries on Moving Objects
EDBT 2006 SCUBA2
Outline
Motivation Related Work Our Approach: SCUBA Experimental Study Conclusion
EDBT 2006 SCUBA3
Challenges for Continuous Query Processing on Spatio-Temporal Data Streams
Scalability Large number of objects Large number of queries
Limited Resources Memory CPU
Real-time Response Requirement
The challenge is to provide fast query response in update-intensive environments
- moving objects- dynamic range query
- dynamic kNN query
Novel Idea: Exploit thefact that objects naturally move
in groups (i.e., clusters) to optimize query evaluation
EDBT 2006 SCUBA4
Motivation
Monitor the traffic in the
red areas
Continuously return the
area covered by the heard during the migration
EDBT 2006 SCUBA5
Big Picture
SINA [MXA04] SEA-CNN [XMA05] Q-Index [PXK+02] PSoup [CF03] NiagaraCQ [CDT00]
SR [SR01] DQ [LPM02] CNN [TPS] TPR [SJL00]
Traditional Execution Shared Execution
Our work (SCUBA)
Shared Cluster-Based Execution
We use clustering as means to improve execution of spatio-temporal queries onmoving objects
Novel Idea
Moving Objects Spatio-temporalqueries
Scan Scan
Q1 Q2
Spatial Join
(B) Shared query plan for spatio-temporal queries
Scan Scan
Q1 Q2
Join Between
(C) Shared-cluster based query plan for spatio-temporal queries
Join Within
Moving ClustersMoving ClustersMoving Objects Moving Objects
Scan Scan
Return allcars in region R1
Return allpolice cars in
region R2
Q1 Q2
(A) Individual query plans for spatio-temporal queries
EDBT 2006 SCUBA6
Our Idea: Moving Clusters
Main Idea: Abstracting individual entities into a cluster based on common attributes
- Direction
- Speed
- Spatial Position
With cluster abstractions,
we want to minimize the number of unnecessary individual object/query joins, thus optimizing query evaluation
Continuously retrieve closest police car next
to me
Police Car
Scalable Cluster-Based Algorithm for Evaluating Continuous Spatio-Temporal Queries on Moving Objects (SCUBA)
EDBT 2006 SCUBA7
Advantage of Moving Clusters
When clusters don’t overlap, we avoid many joins of individual objects within those clusters
m1m2
No need to join objects/queries in m1 with queries/objects in m2
- Moving object - Spatio-temporal range query
We present SCUBA in the context of continuous spatio-temporal range queries
If two abstractions do not ‘overlap' then we can discard negative candidates
and avoid individual joins.
EDBT 2006 SCUBA8
Advantage of Moving Clusters Objects/Queries continuously move Grid cells are static:
If put in grid, we have to continuously have to take them and put into and out of grid cells.
Instead we want to make "flexible cells" i.e., moving clusters
EDBT 2006 SCUBA9
Architecture Overview SCUBA-enabled motion operator execution Answers produced periodically (every )
SCUBA - Motion Operator
Moving ObjectsData Stream
Moving QueriesData Stream
Results DataStream
-range query
Time interval expires
Grid-based Join Between/Within
Clusters
StreamGenerator
Query PlanGenerator
StreamGenerator
Raindrop Workhorse
ExecutionEngine
ExecutionScheduler
StatisticsGatherer
StreamReceiver
CAPE Engine
User Query
Control Flow
Data Flow
Legend:
User Query
End User
Internet
CAPE
-moving object
Moving Clusters
EDBT 2006 SCUBA10
Network Constrained Movement
Movement is constrained withinroad network: Roads = edges Intersections = connection nodes
New York City
ConnectionNode (CNLoc)
SCUBA supports both constrained and unconstrained movement.
EDBT 2006 SCUBA11
Moving Cluster Representation in SCUBA
Centroid
Actual Cluster SizeΘD
Max Cluster Size
Direction Vector
Cluster members: moving objects
Cluster members: moving queries
Cluster Member Representation Inside Cluster:
Centroid
Cluster member:(moving object)
Moving clusters expire after some time
Speed
EDBT 2006 SCUBA12
SCUBA Execution SCUBA produces result periodically (every time units)
Phase I: Cluster Pre-Join Maintenance Formation of new clusters Dissolving “empty” clusters Expanding existing clusters
Cluster-Based Joining
Clusters Position Update
…
DONE
DONE
SCUBA has three phases
Phase II: Cluster-Based Joining Joining clusters Joining objects and queries inside clusters
Phase III: Cluster Post-Join Maintenance Dissolving “expiring” clusters Relocating “non-expiring” clusters based on velocity vector
TimeoutIn-memory
clustering
Ob
ject
&
Qu
erie
s
DONE
Cluster Pre-Join Maintenance
Cluster-Based Joining
Cluster Post-Join Maintenance
Send Results
EDBT 2006 SCUBA13
Phase I: Cluster Pre-Join Maintenance Clustering is done incrementally (upon the arrival of updates) Location update format
(ID, Loct, t, Speed, CNLoc, ...)
Use 2 thresholds + destination ΘD – distance threshold
ΘS – speed threshold Destination
ConnectionNode (CNLoc)
Clustering Algorithm is based on Leader-Follower Clustering Algorithm(J.A. Hartigan. Clustering Algorithms,John Wiley and Sons 1975)
(1) New moving object arrives
(2) Hash objectinto grid
(3) Add object tocluster and update cluster attributes
M1
M2
M3
M1
M2
M3
-centroid position-radius-average speed-member count
Parent Cluster
(4) If cluster has expanded check foroverlap with neighboringcells (make new entries if necessary)
Clustering New Object Example
(5) If object left existing cluster,for a new cluster and old cluster is “empty”, dissolve old cluster.
EDBT 2006 SCUBA14
Phase II: Cluster-Based JoiningLocation Updates
Arrival
Incremental Clustering Cluster-Based Join
∆ expires
1. Join-Between2. Join-Within
Phase I Phase II
Cluster-Based Join
EDBT 2006 SCUBA15
Phase II: Cluster-Based Joining Join-Between
Between two clusters
Join-Within For each cluster (joining objects and queries inside) For two overlapping clusters (cross-join between objects and queries
from the two clusters)
Join-Between
= overlap
Join-Within
ignored
= query results
Join-Within
EDBT 2006 SCUBA16
Phase III: Cluster Post-Join Maintenance
ConnectionNode
Dissolved
New ClusterPosition Updated
Insert into the grid
Clear the grid
Dissolve “expiring” clusters
Relocate “non-expiring” clusters based on velocity vector back into the grid
EDBT 2006 SCUBA17
Moving Cluster-Based Load Shedding
ΘD
Velocity Vector
O1(r1,1)
O2(r2,2)
O3(r3,3)
Q4(r4,4)
Q5(r5,5)
Load Shedding - process of dropping excess load from the system when the demand on resources is above the system capacity [TCZ03].
Load shedding reduces resource requirements by dropping data, thereby sacrificing the accuracy of the query answers.
The main goal is to minimize the degradation in accuracy.
Focus: Discarding data
inside moving clusters
EDBT 2006 SCUBA18
Experimental Settings Implemented inside Java-based CAPE streaming
system [RDZ05]
Used Network-based Generator of Moving Objects [BR02] to generate a set of moving objects and moving queries in Worcester County (Tiger Line files)
Unless mentioned otherwise, the following are the parameters used:
10,000 moving objects and 10,000 moving queries.
Clustering Thresholds:ΘD = 100 (spatial units), ΘS = 10 (spatial units/time units) ΘN = 0 (no load shedding)
Grid: 100x100
EDBT 2006 SCUBA19
Experimental Results From Dissimilar to Similar Motion
- Higher skew factor means more dense objects and queries (i.e., more clusterable)- Compare against regular grid-based execution (termed REGULAR)
EDBT 2006 SCUBA20
Experimental Results
0
5
10
15
20
25 Offline Clustering Time
Join Time
Tim
e (in
sec
s)
Increm. Non-Inc.iter = 1
Non-Inc.iter = 3
Non-Inc.iter = 5
Non-Inc.iter = 10
- Non-Increm. Clustering Time- Join Time
Incremental vs. Non-incremental:
- Join time slightly improves with non-incremental clustering- But clustering wait time outweighs advantage of faster join
EDBT 2006 SCUBA21
Experimental Results
- Performance of regular grid-based execution improves with finer granularity of grid cells (But memory requirements increase as well)
0
10
20
30
40
50
60
50x50 75x75 100x100 125x125 150x150
REGULAR SCUBA
0
500
1000
1500
2000
50x50 75x75 100x100 125x125 150x150
REGULAR SCUBA
Tim
e (in
sec
s)
(a) Join TimeGrid Cell Count Grid Cell Count
Mem
ory
(in M
B)
(b) Memory Consumption
Varying Grid Cell Size:
EDBT 2006 SCUBA22
Experimental Results Cluster Maintenance:
Cluster maintenance time is cheap relative to the join time
EDBT 2006 SCUBA23
Conclusions
1. Designed SCUBA is a novel cluster-based algorithm for continuously evaluating spatio-temporal queries.
2. Scalability in SCUBA is achieved through shared cluster-based execution.
3. Implemented SCUBA in CAPE streaming database
4. Experimental results show that SCUBA outperforms regular grid-based indexing scheme when executing on densely moving objects
5. Clustering significantly improves performance when processing densely moving objects
6. Maintaining clusters (overhead) is very small
EDBT 2006 SCUBA24
Future Work Non-circular clusters Extend to other types of spatio-temporal queries
CKNN Aggregate
Hierarchical clustering (merge and break-down clusters)
EDBT 2006 SCUBA25
Thank you.
Mass Pike in Boston Satellite Image, Google Maps 2006
EDBT 2006 SCUBA26
Additional Slides…
EDBT 2006 SCUBA27
References[BR02] Brinkhoff T.: 'A Framework for Generating Network-Based Moving Objects', GeoInformatica, Vol. 6, No. 2, Kluwer, 2002, 153-180[SDK02] D. Stojanovi´c and S. Djordjevi´c–Kajan: Location–based Web services for tracking and visual route analysis of mobile objects. In: Proceedings of Yu INFO Conference, Kopaonik, 2002, CD ROM (Serbian).[GL04] Gedik, B., Liu, L. MobiEyes: Distributed Processing of Continuously Moving Queries on Moving Objects in a Mobile System. EDBT, 2004.[MXA04] Mokbel, M., Xiong, X., Aref, W. SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases. SIGMOD, 2004.[PXK+02] Prabhakar, S., Xia, Y., Kalashnikov, D., Aref, W., Hambrusch, S. Query Indexing and Velocity Constrained Indexing: Scalable Techniques for Continuous Queries on Moving Objects. IEEE Transactions on Computers, 51(10): 1124-1140, 2002.[XMA05] Xiong, X., Mokbel, M., Aref, W. SEA-CNN: Scalable Processing of Continuous K-Nearest Neighbor Queries in Spatio-temporal Databases. ICDE, 2005.[WCL02] Ouri Wolfson, Hu Cao, Hai Lin, Goce Trajcevski, Fengli Zhang, Naphtali Rishe: Management of Dynamic Location Information in DOMINO. EDBT 2002: 769-771[BBH04] L. Becker, H. Blunck, K. Hinrichs, J. Vahrenhold: A Framework for Representing Moving Objects. Proceedings of the 14th International Conference on Database and Expert Systems Applications (DEXA 2004) Berlin, 2004, 854 - 863[AG04] V. T. Almeida and R. H. Guting. Indexing the trajectories of moving objects in networks. Technical Report 309, FernuniversitÄat Hagen, Fachbereich Informatik, 2004.[PJT00] D. Pfoser, C. S. Jensen, and Y. Theodoridis. Novel approaches to the indexing of moving object trajectories. In Proceedings of the 26th International Conference on Very Large Databases, pages 395–406, 2000.[TPS02] Yufei Tao, Dimitris Papadias, and Qiongmao Shen. Continuous Nearest Neighbor Search. In VLDB, 2002.[LPM02] Iosif Lazaridis, Kriengkrai Porkaew, and Sharad Mehrotra. Dynamic Queries over Mobile Objects. In EDBT, 2002[SR01] Zhexuan Song and Nick Roussopoulos. K-Nearest Neighbor Search for Moving Query Point. In SSTD, 2001.[LPM02] Iosif Lazaridis, Kriengkrai Porkaew, and Sharad Mehrotra. Dynamic Queries over Mobile Objects. In EDBT, 2002.[TPS] Yufei Tao, Dimitris Papadias, and Qiongmao Shen. Continuous Nearest Neighbor Search. In VLDB, 2002.[SJL00] Simonas Saltenis, Christian S. Jensen, Scott T. Leutenegger, and Mario A. Lopez. Indexing the Positions of Continuously Moving Objects. In SIGMOD, 2000.[RDZ05] Elke A. Rundensteiner, Luping Ding, Yali Zhu, Timothy Sutherland and Bradford Pielech, CAPE:A Constraint-Aware Adaptive Stream Processing Engine, Invited Book Chapter, in Stream Data Management (Advances in Database Systems Series), 2005, chapter 5, Springer Verlag, pp. 83-111.
EDBT 2006 SCUBA28
Data Structures Objects Table Queries Table ClusterHome Table ClusterStorage Table ClusterGrid
1
246
5637
42