Upload
larissa-carpenter
View
42
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Indexing Multidimensional Data : A Mapping Based Approach. Rui Zhang. Outline. Backgrounds Multidimensional data and queries Mapping based multidimensional indexing and query processing General strategy Window queries K nearest neighbor ( KNN ) queries Summary and future work. - PowerPoint PPT Presentation
Citation preview
Indexing Multidimensional Data :A Mapping Based Approach
Rui Zhang
Outline
Backgrounds
Multidimensional data and queries
Mapping based multidimensional indexing and query processing
General strategy Window queries K nearest neighbor (KNN) queries
Summary and future work
Multidimensional Data Spatial data
Geographic Information: Melbourne (37, 145) Which city is at (30, 140)?
Computer Aided Design: width and height (40, 50) Any part that has a width of 40 and height of 50?
Records with multiple attributes Employee (ID, age, score, salary, …) Is there any employee whose
age is under 25 and performance score is greater than 80 andsalary is between 3000 and 5000
Multimedia data Color histograms of images Give me the most similar
image to
Multimedia Features: color, shape, texture
ID Age Score Salary …
…
(high-dimensionality)
(medium-dimensionality)
(low-dimensionality)
Multidimensional Queries Point query
Return the objects located at Q(x1, x2, …, xd).
E.g. Q=(3.4, 6.6).
Window query
Return all the objects enclosed or intersected by the hyper-rectangle W{[L1, U1], [L2, U2], …, [Ld, Ud]}.
E.g. W={[0,4],[2,5]}
K-Nearest Neighbor Query (KNN Query)
Return k objects whose distances to Q are no larger than any other object’ distance to Q.
E.g. 3NN of Q=(4,1)
Mapping Based Multidimensional Indexing
Story The CBD: [0,4][2,5] Blocks in the CBD are: [8,15], [32,33] and [36,37]
General strategy: three steps Data mapping and indexing Query mapping and data retrieval Filtering out false positive
Name x y Block Height
A 0.7 1.2 2 100
B 5.8 1.2 19 50
C 2.7 2.3 12 80
D 5.5 2.4 25 90
E 6.6 2.5 28 40
F 1.7 3.8 11 120
G 2.8 4.7 36 100
H 0.6 5.8 34 50
I 1.6 6.7 41 60
J 3.4 6.6 45 40
Name x y Block Height
A 0.7 1.2 2 100
F 1.7 3.8 11 120
C 2.7 2.3 12 80
B 5.8 1.2 19 50
D 5.5 2.4 25 90
E 6.6 2.5 28 40
H 0.6 5.8 34 50
G 2.8 4.7 36 100
I 1.6 6.7 41 60
J 3.4 6.6 45 40
Sort
Another mapping example
Story continued The CBD: [0,4][2,5] Streets intersected by the CBD are: [11,14], [21,22] and 41
The Pyramid-tree [SIGMOD’98] Data space divided into 2d pyramids
Streets are parallel to the base of the pyramid Data mapping
Objects mapped to the street numbers Query mapping
Query window mapped to all the intersected streets
Name x y Street Height
A 0.7 1.2 14 100
B 5.8 1.2 23 50
C 2.7 2.3 22 80
D 5.5 2.4 22 90
E 6.6 2.5 33 40
F 1.7 3.8 13 120
G 2.8 4.7 12 100
H 0.6 5.8 14 50
I 1.6 6.7 43 60
J 3.4 6.6 43 40
Name x y Street Height
G 2.8 4.7 12 100
F 1.7 3.8 13 120
A 0.7 1.2 14 100
H 0.6 5.8 14 50
C 2.7 2.3 22 80
D 5.5 2.4 22 90
B 5.8 1.2 23 50
E 6.6 2.5 33 40
I 1.6 6.7 43 60
J 3.4 6.6 43 40
Sort
Deficiency of the Pyramid-tree Sensitivity to location of
query window
A set of d functions: t1 , t2 , …, td ; ti satisfies that:
bijection from [0,1] to [0,1] monotonic ti (ci) = 0.5
Apply ti to the query, so that:
The answers of the transformed queries over the transformed data are the answers of the original query over the original data.
Magic of mapping
Ci=0.25 Ci=0.707
The P+-tree [ICDE’04]
Two measures
Space division
Mapping the data
Performance
3
2
1
Mapping for KNN Queries
Story continued New factory at Q[4,1] Find 3 nearest buildings to Q
Termination condition K candidates All in the current search circle
Name x y Street Height
A 0.7 1.2 14 100
B 5.8 1.2 32 50
C 2.7 2.3 12 80
D 5.5 2.4 31 90
E 6.6 2.5 32 40
F 1.7 3.8 13 120
G 2.8 4.7 24 100
H 0.6 5.8 23 50
I 1.6 6.7 22 60
J 3.4 6.6 24 40
Sort
11121314
21222324
3132
Name x y Street Height
C 2.7 2.3 12 80
F 1.7 3.8 13 120
A 0.7 1.2 14 100
I 1.6 6.7 22 60
H 0.6 5.8 23 50
G 2.8 4.7 24 100
J 3.4 6.6 24 40
D 5.5 2.4 31 90
B 5.8 1.2 32 50
E 6.6 2.5 32 40
Rank 1 2 3
Candidate A
Distance to Q 3.31
Q
Rank 1 2 3
Candidate B A F
Distance to Q 1.81 3.31 3.62
Rank 1 2 3
Candidate B E A
Distance to Q 1.81 3.00 3.31
Rank 1 2 3
Candidate A F
Distance to Q 3.31 3.62
Rank 1 2 3
Candidate B C E
Distance to Q 1.81 1.84 3.00
Rank 1 2 3
Candidate B C D
Distance to Q 1.81 1.84 2.05
||AQ|| = 3.31||FQ|| = 3.62||BQ|| = 1.81||EQ|| = 3.00||CQ|| = 1.84||DQ|| = 2.05
1234
R = 0.35R = 0.70R = 1.05R = 1.40R = 1.75R = 2.10
The iDistance [TODS’05a] Data partitioned into a number of clusters
Streets are concentric circles
Data mapping Objects mapped to street numbers
Query mapping Search circle mapped to streets intersected
Performance
Summary P+-tree for Window Queries [ICDE’04]
iDistance for kNN Queries [TODS’05a]
A function for mapping data and queries. Efficiency lie in the design of the mapping function
Generalized Multidimensional Data Mapping and Query Processing [TODS’05b]
Summary
Queries on moving objects, continuous queries Predictive range and knn queries [InfSys’10] Continuous retrieval of 3D objects [ICDE’08b, VLDBJ’10b] Continuous intersection join [ICDE’08a, VLDBJ’12] Continuous knn join [GeoInformatica’10] (Continuous) Moving knn queries [VLDB’08b, VLDBJ’10a, InfSys’13] Other types of incremental queries [TKDE’10]
Temporal queries Version index with compression [VLDB’08a] Memory hierarchy friendly index, HV-tree [VLDB’10]
Recent work and Trend
References [SIGMOD’98] Stefan Berchtold, Christian Böhm, Hans-Peter Kriegel. The Pyramid-Technique: Towards Breaking the Curse of Dimensionality. ACM
SIGMOD International Conference on Management of Data (SIGMOD) 1998.
[ICDE’04] Rui Zhang, Beng Chin Ooi, Kian-Lee Tan. Making the Pyramid Technique Robust to Query Types and Workloads. IEEE International Conference
on Data Engineering (ICDE) 2004.
[TODS’05a] H.V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, Rui Zhang. iDistance: An Adaptive B+-tree Based Indexing Method for Nearest Neighbor
Search. ACM Transactions on Data Base Systems (TODS), 30(2), 2005.
[TODS’05b] Rui Zhang, Panos Kalnis, Beng Chin Ooi, Kian-Lee Tan. Generalized Multi-dimensional Data Mapping and Query Processing. ACM
Transactions on Data Base Systems (TODS), 30(3), 2005.
[VLDB’00] Frank Ramsak, Volker Markl, Robert Fenk, Martin Zirkel, Klaus Elhardt, Rudolf Bayer. Integrating the UB-Tree into a Database System Kernel.
International Conference on Very Large Data Bases (VLDB) 2000.
[PODS’00] Beng Chin Ooi, Kian-Lee Tan, Cui Yu, Stéphane Bressan. Indexing the Edges - A Simple and Yet Efficient Approach to High-Dimensional
Indexing. ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS) 2000.
[ICDE’08a] Rui Zhang, Dan Lin, Kotagiri Ramamohanarao, Elisa Bertino. Continuous Intersection Joins over Moving Objects. Proceedings of the 24th
International Conference on Data Engineering (ICDE), pp. 863-872, April 7-12, 2008.
[ICDE’08b] Mohammed Eunus Ali, Rui Zhang, Egemen Tanin, Lars Kulik. A Motion-Aware Approach to Continuous Retrieval of 3D Objects. Proceedings
of the 24th International Conference on Data Engineering (ICDE), pp. 843-852, April 7-12, 2008.
[VLDB’08a] David Lomet, Mingsheng Hong, Rimma Nehme, Rui Zhang: Transaction Time Indexing with Version Compression. Proceedings of the VLDB
Endowment (PVLDB), 1(1), 870-881, 2008.
[VLDB’08b] Sarana Nutanong, Rui Zhang, Egemen Tanin, Lars Kulik: The V*-Diagram: A Query Dependent Approach to Moving KNN Queries.
Proceedings of the VLDB Endowment (PVLDB), 1(1), 1095-1106, 2008.
[GeoInformatica’10] Cui Yu, Rui Zhang, Yaochun Huang, Hui Xiong: High-dimensional kNN joins with incremental updates. GeoInformatica, 1 (14), 55-82,
2010.
[VLDB’10] Rui Zhang, Martin Stradling. The HV-tree: a Memory Hierarchy Aware Version Index. Proceedings of the VLDB Endowment (PVLDB), 3(1),
397-408, 2010.
[VLDBJ’10a] Sarana Nutanong, Rui Zhang, Egemen Tanin, Lars Kulik. Analysis and Evaluation of V*-kNN: An Efficient Algorithm for Moving kNN
Queries. VLDB Journal, 19(3): 307-332, 2010.
[VLDBJ’10b] Mohammed Eunus Ali, Egemen Tanin, Rui Zhang, Lars Kulik. A Motion-Aware Approach for Efficient Evaluation of Continuous Queries on
3D Object Databases. VLDB Journal, 19(5): 603-632, 2010.
[TKDE’10] Sarana Nutanong, Egemen Tanin, Rui Zhang. Incremental Evaluation of Visible Nearest Neighbor Queries. IEEE Transactions on Knowledge &
Data Engineering (TKDE), 22(5): 665-681, 2010.
[InfSys’10] Rui Zhang, H. V. Jagadish, Bing Tian Dai, Kotagiri Ramamohanarao. Optimized Algorithms for Predictive Range and KNN Queries on Moving
Objects. Information Systems, 35(8): 911-932, 2010.
[VLDBJ’12] Rui Zhang, Jianzhong Qi, Dan Lin, Wei Wang, Raymond Chi-Wing Wong. A Highly Optimized Algorithm for Continuous Intersection Join
Queries over Moving Objects. VLDB Journal, 21(4): 561-586, 2012.
[VLDBJ’13] Tanzima Hashem, Lars Kulik, Rui Zhang. Countering Overlapping Rectangle Privacy Attack for Moving kNN Queries. Information Systems.
38(3): 430-453, 2013.