Multidimensional Indexing

  • View
    2.676

  • Download
    0

Embed Size (px)

DESCRIPTION

 

Text of Multidimensional Indexing

  • 1. Multidimensional Indexes Chapter 5
  • 2. Outline Applications Hash-like structures Tree-like structures Bit-map indexes
  • 3. Search Key Search key is (F1, F2, ..Fk) Separated by special markers. Example If F1=abcd and F2=123, the search key is abcd#123
  • 4. Applications: GIS Geographic Information Systems Objects are in two dimensional space The objects may be points or shapes Objects may be houses, roads, bridges, pipelines and many other physical objects. Query types Partial match queries Specify the values for one or more dimensions and look for all points matching those values in those dimensions. Range queries Set of shapes within the range Nearest neighbor queries Closest point to a given point Where-am-I queries When you click a mouse, the system determines which of the displayed elements you were clicking.
  • 5. Data Cubes Data exists in high dimensional space A chain store may record each sale made, including The day and time The store in which the sale was made The item purchased The color of the item The size of the item Attributes are seen as dimensions multidimensional space, data cube. Typical query Give a class of pink shirts for each store and each month of 1998.
  • 6. Multidimensional queries in SQL Represent points as a relation Points(x,y) Query Find the nearest point to (10, 20) Compute the distance between (10,20) to every other point.
  • 7. Rectangles Rectangles(id, x11, y11, xur, yur) If the query is Find the rectangles enclosing the point (10,20) SELECT id FROM Rectangles WHERE x11 =20.0;
  • 8. Data Cube Fact table Sales(day,store, item. color, size) Query Summarize the sale of pink shirts by day and store SELECT day, store, COUNT(*) AS totalSales FROM Sales WHERE item=shirt AND color=pink GROUP BY day, store;
  • 9. Executing range queries in Conventional Indexes Motivation: Find records where DEPT = Toy AND SAL > 50k Strategy 1: Use Toy index and check salary Strategy II: Use Toy index and SAL index and intersect. Complexity Toy index: number of disk accesses = number of disk blocks SAL index: number of disk accesses= number of records So conventional indexes of little help.
  • 10. Executing nearest-neighbor queries using conventional indexes There might be no point in the selected range Repeat the search by increasing the range The closest point within the range might not be the closest point overall. There is one point closer from outside range.
  • 11. Other Limitations of Conventional Indexes We can only keep the file sorted on only attribute. If the query is on multiple dimensions We will end-up having one disk access for each record It becomes too expensive
  • 12. Overview of Multidimensional Index Structures Hash-table-like approaches Tree-like approaches In both cases, we give-up some properties of single dimensional indexes Hash We have to access multiple buckets Tree Tree may not be balanced There may not exist a correspondence between tree nodes and disk blocks Information in the disk block is much smaller.
  • 13. Hash like structures: GRID Files Each dimension, grid lines partition the space into stripes, Points that fall on a grid line will be considered to belong to the stripe for which that grid line is lower boundary The number of grid lines in each dimension may vary. Space between grid lines in the same dimension may also vary
  • 14. Grid Index Key 2 X1 X2 Xn V1 V2Key 1 Vn To records with key1=V3, key2=X2 14
  • 15. CLAIM Can quickly find records with key 1 = Vi Key 2 = Xj key 1 = Vi key 2 = Xj 15
  • 16. CLAIM Can quickly find records with key 1 = Vi Key 2 = Xj key 1 = Vi key 2 = Xj And also ranges. E.g., key 1 Vi key 2 < Xj 16
  • 17. But there is a catch with Grid Indexes! How is Grid Index stored on disk? V1 V2 V3LikeArray... X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4 17
  • 18. But there is a catch with Grid Indexes! How is Grid Index stored on disk? V1 V2 V3LikeArray... X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4Problem: Need regularity so we can compute position of entry 18
  • 19. Solution: Use Indirection X1 X2 X3 Buckets -- V1 -- -- V2 -- -- -- V3 *Grid only -- V4 -- -- contains pointers to buckets -- -- -- -- Buckets -- -- 19
  • 20. With indirection: Grid can be regular without wasting space We do have price of indirection 20
  • 21. Can also index grid on value rangesSalary Grid 0-20K 1 20K-50K 2 50K- 3 8 1 2 3Linear Scale Toy Sales Personnel 21
  • 22. Grid files Good for multiple-key search+ Space, management overhead - (nothing is free) Need partitioning ranges that evenly - split keys 22
  • 23. Partitioned Hash Functions
  • 24. Partitioned hash functionIdea: 010110 1110010Key1 Key2 h1 h2 24
  • 25. EX:h1(