13
Indexing Multidimensional Data : A Mapping Based Approach Rui Zhang

Indexing Multidimensional Data : A Mapping Based Approach

Embed Size (px)

DESCRIPTION

Indexing Multidimensional Data : A Mapping Based Approach. Rui Zhang. Outline. Backgrounds Multidimensional data and queries Mapping based multidimensional indexing and query processing General strategy Window queries K nearest neighbor ( KNN ) queries Summary and future work. - PowerPoint PPT Presentation

Citation preview

Page 1: Indexing Multidimensional Data : A Mapping Based Approach

Indexing Multidimensional Data :A Mapping Based Approach

Rui Zhang

Page 2: Indexing Multidimensional Data : A Mapping Based Approach

Outline

Backgrounds

Multidimensional data and queries

Mapping based multidimensional indexing and query processing

General strategy Window queries K nearest neighbor (KNN) queries

Summary and future work

Page 3: Indexing Multidimensional Data : A Mapping Based Approach

Multidimensional Data Spatial data

Geographic Information: Melbourne (37, 145) Which city is at (30, 140)?

Computer Aided Design: width and height (40, 50) Any part that has a width of 40 and height of 50?

Records with multiple attributes Employee (ID, age, score, salary, …) Is there any employee whose

age is under 25 and performance score is greater than 80 andsalary is between 3000 and 5000

Multimedia data Color histograms of images Give me the most similar

image to

Multimedia Features: color, shape, texture

ID Age Score Salary …

(high-dimensionality)

(medium-dimensionality)

(low-dimensionality)

Page 4: Indexing Multidimensional Data : A Mapping Based Approach

Multidimensional Queries Point query

Return the objects located at Q(x1, x2, …, xd).

E.g. Q=(3.4, 6.6).

Window query

Return all the objects enclosed or intersected by the hyper-rectangle W{[L1, U1], [L2, U2], …, [Ld, Ud]}.

E.g. W={[0,4],[2,5]}

K-Nearest Neighbor Query (KNN Query)

Return k objects whose distances to Q are no larger than any other object’ distance to Q.

E.g. 3NN of Q=(4,1)

Page 5: Indexing Multidimensional Data : A Mapping Based Approach

Mapping Based Multidimensional Indexing

Story The CBD: [0,4][2,5] Blocks in the CBD are: [8,15], [32,33] and [36,37]

General strategy: three steps Data mapping and indexing Query mapping and data retrieval Filtering out false positive

Name x y Block Height

A 0.7 1.2 2 100

B 5.8 1.2 19 50

C 2.7 2.3 12 80

D 5.5 2.4 25 90

E 6.6 2.5 28 40

F 1.7 3.8 11 120

G 2.8 4.7 36 100

H 0.6 5.8 34 50

I 1.6 6.7 41 60

J 3.4 6.6 45 40

Name x y Block Height

A 0.7 1.2 2 100

F 1.7 3.8 11 120

C 2.7 2.3 12 80

B 5.8 1.2 19 50

D 5.5 2.4 25 90

E 6.6 2.5 28 40

H 0.6 5.8 34 50

G 2.8 4.7 36 100

I 1.6 6.7 41 60

J 3.4 6.6 45 40

Sort

Page 6: Indexing Multidimensional Data : A Mapping Based Approach

Another mapping example

Story continued The CBD: [0,4][2,5] Streets intersected by the CBD are: [11,14], [21,22] and 41

The Pyramid-tree [SIGMOD’98] Data space divided into 2d pyramids

Streets are parallel to the base of the pyramid Data mapping

Objects mapped to the street numbers Query mapping

Query window mapped to all the intersected streets

Name x y Street Height

A 0.7 1.2 14 100

B 5.8 1.2 23 50

C 2.7 2.3 22 80

D 5.5 2.4 22 90

E 6.6 2.5 33 40

F 1.7 3.8 13 120

G 2.8 4.7 12 100

H 0.6 5.8 14 50

I 1.6 6.7 43 60

J 3.4 6.6 43 40

Name x y Street Height

G 2.8 4.7 12 100

F 1.7 3.8 13 120

A 0.7 1.2 14 100

H 0.6 5.8 14 50

C 2.7 2.3 22 80

D 5.5 2.4 22 90

B 5.8 1.2 23 50

E 6.6 2.5 33 40

I 1.6 6.7 43 60

J 3.4 6.6 43 40

Sort

Page 7: Indexing Multidimensional Data : A Mapping Based Approach

Deficiency of the Pyramid-tree Sensitivity to location of

query window

A set of d functions: t1 , t2 , …, td ; ti satisfies that:

bijection from [0,1] to [0,1] monotonic ti (ci) = 0.5

Apply ti to the query, so that:

The answers of the transformed queries over the transformed data are the answers of the original query over the original data.

Magic of mapping

Ci=0.25 Ci=0.707

Page 8: Indexing Multidimensional Data : A Mapping Based Approach

The P+-tree [ICDE’04]

Two measures

Space division

Mapping the data

Performance

Page 9: Indexing Multidimensional Data : A Mapping Based Approach

3

2

1

Mapping for KNN Queries

Story continued New factory at Q[4,1] Find 3 nearest buildings to Q

Termination condition K candidates All in the current search circle

Name x y Street Height

A 0.7 1.2 14 100

B 5.8 1.2 32 50

C 2.7 2.3 12 80

D 5.5 2.4 31 90

E 6.6 2.5 32 40

F 1.7 3.8 13 120

G 2.8 4.7 24 100

H 0.6 5.8 23 50

I 1.6 6.7 22 60

J 3.4 6.6 24 40

Sort

11121314

21222324

3132

Name x y Street Height

C 2.7 2.3 12 80

F 1.7 3.8 13 120

A 0.7 1.2 14 100

I 1.6 6.7 22 60

H 0.6 5.8 23 50

G 2.8 4.7 24 100

J 3.4 6.6 24 40

D 5.5 2.4 31 90

B 5.8 1.2 32 50

E 6.6 2.5 32 40

Rank 1 2 3

Candidate A

Distance to Q 3.31

Q

Rank 1 2 3

Candidate B A F

Distance to Q 1.81 3.31 3.62

Rank 1 2 3

Candidate B E A

Distance to Q 1.81 3.00 3.31

Rank 1 2 3

Candidate A F

Distance to Q 3.31 3.62

Rank 1 2 3

Candidate B C E

Distance to Q 1.81 1.84 3.00

Rank 1 2 3

Candidate B C D

Distance to Q 1.81 1.84 2.05

||AQ|| = 3.31||FQ|| = 3.62||BQ|| = 1.81||EQ|| = 3.00||CQ|| = 1.84||DQ|| = 2.05

1234

R = 0.35R = 0.70R = 1.05R = 1.40R = 1.75R = 2.10

Page 10: Indexing Multidimensional Data : A Mapping Based Approach

The iDistance [TODS’05a] Data partitioned into a number of clusters

Streets are concentric circles

Data mapping Objects mapped to street numbers

Query mapping Search circle mapped to streets intersected

Performance

Page 11: Indexing Multidimensional Data : A Mapping Based Approach

Summary P+-tree for Window Queries [ICDE’04]

iDistance for kNN Queries [TODS’05a]

A function for mapping data and queries. Efficiency lie in the design of the mapping function

Generalized Multidimensional Data Mapping and Query Processing [TODS’05b]

Summary

Page 12: Indexing Multidimensional Data : A Mapping Based Approach

Queries on moving objects, continuous queries Predictive range and knn queries [InfSys’10] Continuous retrieval of 3D objects [ICDE’08b, VLDBJ’10b] Continuous intersection join [ICDE’08a, VLDBJ’12] Continuous knn join [GeoInformatica’10] (Continuous) Moving knn queries [VLDB’08b, VLDBJ’10a, InfSys’13] Other types of incremental queries [TKDE’10]

Temporal queries Version index with compression [VLDB’08a] Memory hierarchy friendly index, HV-tree [VLDB’10]

Recent work and Trend

Page 13: Indexing Multidimensional Data : A Mapping Based Approach

References [SIGMOD’98] Stefan Berchtold, Christian Böhm, Hans-Peter Kriegel. The Pyramid-Technique: Towards Breaking the Curse of Dimensionality. ACM

SIGMOD International Conference on Management of Data (SIGMOD) 1998.

[ICDE’04] Rui Zhang, Beng Chin Ooi, Kian-Lee Tan. Making the Pyramid Technique Robust to Query Types and Workloads. IEEE International Conference

on Data Engineering (ICDE) 2004.

[TODS’05a] H.V. Jagadish, Beng Chin Ooi, Kian-Lee Tan, Cui Yu, Rui Zhang. iDistance: An Adaptive B+-tree Based Indexing Method for Nearest Neighbor

Search. ACM Transactions on Data Base Systems (TODS), 30(2), 2005.

[TODS’05b] Rui Zhang, Panos Kalnis, Beng Chin Ooi, Kian-Lee Tan. Generalized Multi-dimensional Data Mapping and Query Processing. ACM

Transactions on Data Base Systems (TODS), 30(3), 2005.

[VLDB’00] Frank Ramsak, Volker Markl, Robert Fenk, Martin Zirkel, Klaus Elhardt, Rudolf Bayer. Integrating the UB-Tree into a Database System Kernel.

International Conference on Very Large Data Bases (VLDB) 2000.

[PODS’00] Beng Chin Ooi, Kian-Lee Tan, Cui Yu, Stéphane Bressan. Indexing the Edges - A Simple and Yet Efficient Approach to High-Dimensional

Indexing. ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS) 2000.

[ICDE’08a] Rui Zhang, Dan Lin, Kotagiri Ramamohanarao, Elisa Bertino. Continuous Intersection Joins over Moving Objects. Proceedings of the 24th

International Conference on Data Engineering (ICDE), pp. 863-872, April 7-12, 2008.

[ICDE’08b] Mohammed Eunus Ali, Rui Zhang, Egemen Tanin, Lars Kulik. A Motion-Aware Approach to Continuous Retrieval of 3D Objects. Proceedings

of the 24th International Conference on Data Engineering (ICDE), pp. 843-852, April 7-12, 2008.

[VLDB’08a] David Lomet, Mingsheng Hong, Rimma Nehme, Rui Zhang: Transaction Time Indexing with Version Compression. Proceedings of the VLDB

Endowment (PVLDB), 1(1), 870-881, 2008.

[VLDB’08b] Sarana Nutanong, Rui Zhang, Egemen Tanin, Lars Kulik: The V*-Diagram: A Query Dependent Approach to Moving KNN Queries.

Proceedings of the VLDB Endowment (PVLDB), 1(1), 1095-1106, 2008.

[GeoInformatica’10] Cui Yu, Rui Zhang, Yaochun Huang, Hui Xiong: High-dimensional kNN joins with incremental updates. GeoInformatica, 1 (14), 55-82,

2010.

[VLDB’10] Rui Zhang, Martin Stradling. The HV-tree: a Memory Hierarchy Aware Version Index. Proceedings of the VLDB Endowment (PVLDB), 3(1),

397-408, 2010.

[VLDBJ’10a] Sarana Nutanong, Rui Zhang, Egemen Tanin, Lars Kulik. Analysis and Evaluation of V*-kNN: An Efficient Algorithm for Moving kNN

Queries. VLDB Journal, 19(3): 307-332, 2010.

[VLDBJ’10b] Mohammed Eunus Ali, Egemen Tanin, Rui Zhang, Lars Kulik. A Motion-Aware Approach for Efficient Evaluation of Continuous Queries on

3D Object Databases. VLDB Journal, 19(5): 603-632, 2010.

[TKDE’10] Sarana Nutanong, Egemen Tanin, Rui Zhang. Incremental Evaluation of Visible Nearest Neighbor Queries. IEEE Transactions on Knowledge &

Data Engineering (TKDE), 22(5): 665-681, 2010.

[InfSys’10] Rui Zhang, H. V. Jagadish, Bing Tian Dai, Kotagiri Ramamohanarao. Optimized Algorithms for Predictive Range and KNN Queries on Moving

Objects. Information Systems, 35(8): 911-932, 2010.

[VLDBJ’12] Rui Zhang, Jianzhong Qi, Dan Lin, Wei Wang, Raymond Chi-Wing Wong. A Highly Optimized Algorithm for Continuous Intersection Join

Queries over Moving Objects. VLDB Journal, 21(4): 561-586, 2012.

[VLDBJ’13] Tanzima Hashem, Lars Kulik, Rui Zhang. Countering Overlapping Rectangle Privacy Attack for Moving kNN Queries. Information Systems.

38(3): 430-453, 2013.