Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Realizationof DBS
9. Multi-dimensional Access Paths
Theo Härderwww.haerder.de
Goals- Design principles for access paths to the records of a table,
which support several search criteria in a symmetric way- Indexing of point objects and expanded objects
© 2011 AG DBIS
Realization of Database Systems – SS 2011
Main reference:Theo Härder, Erhard Rahm: Datenbanksysteme – Konzepte und Techniken der Implementierung, Springer, 2001, Chapter 9.
Volker Gaede, Oliver Günther: Multidimensional Access Methods, ACM Computing Surveys 30:2, June 1998, pp. 170-231.
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Access Paths – Forest of Descendents
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
© 2011 AG DBIS
Applications ofGeo-DBS
9-2
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Genealogy of Access Paths
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
© 2011 AG DBIS
Applications ofGeo-DBS
9-3
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Multi-dimensional Access Paths
Classification of queries
Basic problems• Indexing new data types – which ones?
Organization of spatial data
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
• Organization of spatial data• Preservation of topology (clustering)• Object representation
Organization of data records• Quadtrees• Multi-key hashing
Organization of embedding data space k d
© 2011 AG DBIS
Applications ofGeo-DBS
9-4
• k-d trees • Grid file
Access paths for expanded objects• R-tree• R+-tree
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Classification of Query Types
Definitions• A file is a collection of N records of type R = (A1, ..., An), where each record represents a
point object by an ordered n-tuple t = (a1, a2, ..., an) of values. The attributes A1, ..., Ak(k n) be keys.
• A query q specifies some conditions which must be satisfied by the key values of the records in the result set.
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
• Intersection queries: qualifying objects overlap with the query window• Containment queries, enclosure queries: qualifying objects are entirely contained in
the query window or completely contain the query window
Classification of intersection queries1. Exact match query: specifies a value for each key:
Q = (A1 = a1) (A2 = a2) ... (Ak = ak)2. Partial match query: specifies s < k key values
Q = (Ai1 = ai1) (Ai2 = ai2) ... (Ais = ais)
© 2011 AG DBIS
Applications ofGeo-DBS
9-5
Q ( i1 i1) ( i2 i2) ( is is)with 1 < s < k und 1 < i1 < i2 < ... < is < k
3. Range query: specifies a range ri = [li < ai < ui] for each key Ai
Q = (A1 = r1) ... (Ak = rk) (A1 > l1) (A1 < u1) ... (Ak > lk) (Ak < uk)
4. Partial range query: specifies a range for s < k Q = (Ai1 = ri1) ... (Ais = ris)with 1 < s < k und 1 < i1 < ... < is < k und rij = [lij < aij < uij], 1 < j < s
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Classification of Query Types (2) General range query
• exact range [li = ai = ui]• infinite range [- < ai < ] All 4 types of intersection queries can be expressed as a general range query
Classification of containment queries• Point query: Given a point in data space D find all objects which do contain it
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
• Point query: Given a point in data space D, find all objects which do contain it. • Region query: Given a query window, find all objects intersecting with it
(enclose it, are contained in it).
Examples: search for rectangles
RECTANGLES (x1, y1, x2, y2)
(x1, y1)
(x2, y2)
a) Find all rectangles which contain point (2 5)
© 2011 AG DBIS
Applications ofGeo-DBS
9-6
a) Find all rectangles which contain point (2,5)SELECT x1, y1, x2, y2 FROM RECTANGLESWHERE x1 <= 2 AND x2 >= 2 AND y1 <= 5 AND y2 >= 5
b) Find rectangles with point (1,3) as corner point left belowSELECT x1, y1, x2, y2 FROM RECTANGLESWHERE x1 = 1 AND y1 = 3
These all are harmless queries to be answered without pain even with SQL
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
3 Future Research Topics: Semantics, … Goals of DBMS development
• last 20 years: performance, availability, functionality• next 20 years: semantics (but we will always obtain only a bit more!)
From equality to approximation!Concept of similarity: many forms, interpretations, applications
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
p y y , p , pp• Information Retrieval, Geo Databases• Classification, Cluster Analysis (Data Mining)• Pattern Recognition, Case-based Reasoning, …
Search for “politician” as a term in a query
politicia
represent
congress
politicaperson
politicallyrested cit
© 2011 AG DBIS
Applications ofGeo-DBS
9-7Douglas R. Hofstadter: “The Location of Meaning” in : Godel, Escher, Bach: An Eternal Golden Braid, Basic Books; 20 Anv edition, 1999
Often implicit assumption: Standardized (meta-)schema for all data sources is existing andcomplete, correct, consistent with true and trustworthy content
10
Domain of uncertainty for similarity problemssufficientinsufficient don’t know
ntative
sman
alny inte-tizen
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Classification of Query Types (3)
Best-match query, nearest-neighbor query• Desired object does not exist Query for objects as similar as possible
XX
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
existing objectsX
X
desired object
distance
• "best" is determined via different kinds of distance functions
Examples• Object only satisfies 8 of 10 requested properties
© 2011 AG DBIS
Applications ofGeo-DBS
9-8
• Object is described by synonyms
Determination of the nearest neighborD = distance function, B = collection of points in the k-dimensional spaceWanted: nearest neighbor of p (in B)
nearest neighbor is q, if
︶D ︵ D q r B ︶ r ︵ p q,p) (r,
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Fundamental Problems1. Preservation of topological structure
X XXXXXX
X XX XX XX XX XX XX X
DBuckets B
X X
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
2. Strongly varying density of objects
XX
XXX
XXX
XX
D
Strong variation of the spatial allocation over time
no equi distant division of D
XX X
XXXX X
X XX X
XXXX X
X
© 2011 AG DBIS
Applications ofGeo-DBS
9-9
Xno equi-distant division of D
but: uniform bucket size (on disk)
3. Object representation- point objects- expanded objects
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Fundamental Problems (2)
4. Dynamic reorganization
X XXX
D Dafter severalinsertions and
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
X
X XX
XXX
XXX
XXX
XXXB B
5. Balanced access structure
- Arbitrary allocations and insertion/deletion sequences
deletions
© 2011 AG DBIS
Applications ofGeo-DBS
9-10
y / q
- Symmetric access via all dimensions!
- Guarantee of uniform and logarithmic access
2 or 3 disk accesses
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Use of One-Dimensional Access Paths
So farIndexing (Inversion) of a single dimension, e.g. B*-tree
Decomposition principle of the key space for B*-trees (2-dim.)
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
X XX
X X X
X X
X
X
XX
Key2
Key1
K2m
K21 X XX
X X X
X
X
X
XX
Key2
Key1
K2m
K21
X
© 2011 AG DBIS
Applications ofGeo-DBS
9-11
K11 K1n
B*-tree (Key1)
Partitioning of space according to values of Key1
K11 K1n
B*-tree (Key2)
Partitioning of space according to values of Key2
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Multi-Attribute Search
Access to K2j) (Key2 K1i) (Key1
OR
AND...
• Pointer list for K1i: from B*-tree (Key1)• Pointer list for K2j: from B*-tree (Key2)
Key2
K2j
AND
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
merge + access to result tuples
Key1K1i large pointer lists and intermediate results
Simulation of multi-dimensional access using a B*-tree?
Key1| Key2
Idea:concatenated keys andconcatenated values:
Search Operations:
© 2011 AG DBIS
Applications ofGeo-DBS
9-12
Key1| Key2K11 | K21K11 | K22
…K11 | K2mK12 | K21K12 | K22
…⁞K12 | K2mK13 | K21
… ⁞K1n | K2m
Search Operations:- (Key1 = K1i) AND (Key2 = K2j) ?- Key2 = K2j ?- Key1 = K1i ?- OR predicates ?
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Quadtrees (Point Quadtree, Quadrant Tree) Storage structure
• for 2-dimensional multi-attribute access• Decomposition principle of D: recursive partitioning by quadrants
Realization as Generalization of the binary tree• Each node contains a record, out-degree of a node: max. 4
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
, g• Root divides 2-dimensional space in 4 quadrants • Recursive division of each quadrant by the root of a subtree• i-th subtree of a node contains points in the i-th quadrant
Goal: consideration of neighborhood relationships• Example: geographical data with coordinates x and y
N
II = NW I = NE
© 2011 AG DBIS
Applications ofGeo-DBS
9-13
E
S
W
IV = SEIII = SWI + III closedII + IV open
Generalization:• k-dimensional keys => out-degree of each node: 2k
• k=3: octtree, k=4: hextree
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Point Quadtree Node format
x yinfo
NE NW SW SE
Spatial divisionF
SB
DATR
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
NE NW SW SE KL
Properties• Tree structure dependent on
insertion sequence (unbalanced)• Expensive deletions (re-insertion of
the subtree)• No mapping onto pages KL
PS
KA
MALU
HD
© 2011 AG DBIS
Applications ofGeo-DBS
9-14
o app g o to pages
F SB PS KA
DA TR MA LU
HD
Query evaluation?
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Region Quadtree
=1
2 3
4
Idea: recursive decomposition of surface areas into grids (having certain properties per grid)e.g., color, height, …, pixel value
Goal:space-saving representationand simple manipulation set-theoretic operations!
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
=
. . .1 2 4 8
12
3
67
8
4
set-theoretic operations!
Same principle for 3-dim. data:decompose space into cubes
© 2011 AG DBIS
Applications ofGeo-DBS
9-15
Some common uses of quadtrees• Image representation • Spatial indexing• Efficient collision detection in two dimensions • Storing sparse data, such as a formatting information for a spreadsheet or for some
matrix calculations • Solution of multidimensional fields (computational fluid dynamics, electromagnetism)
Quadtrees are the two-dimensional analog of octtrees.
7
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Multi-Key Hashing (Partitioned Hashing) Decomposition principle of D
Partitioning by hash functions in each dimension: is realized by division of bucket address in k fragments (k = number of keys)
For each key i (i=1, 2, …, k)l h h f h d f h b k dd
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
a special hash function hi determines part of the bucket address • number of buckets be 2B (address = sequence of B bits)
• each hash function hi delivers bi bits
•
Idea: separate contributions of k keys to address
k
1i ib B
Key A1 A2 A3Value a a a
A1 | A2 | A3: h(a11 | a22 | a33) only supports point queries!
© 2011 AG DBIS
Applications ofGeo-DBS
9-16
• record t = (a11, a22, ..., aki, ...) is stored in bucket with
addr. = h1(a11) | h2(a22) | ... | hk(aki)
Value a11 a22 a33
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Multi-Key Hashing: Example
y
x
x x011
010
001b2=3
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree Application example
bucket with address ‘10010’ contains all records with h1(a1) = ‘10’ and h2(a2) = ‘010’
xx x000
b1=2
00 01 10 11
© 2011 AG DBIS
Applications ofGeo-DBS
9-17
Eno: INT (5), b1 = 4; SSno: INT (9), b2 = 3; DName: CHAR (10), b3 = 2; B=9 (512 buckets)
h1(Eno) = Eno mod 16 h1(58651) = 11 -> 1011h2(SSno) = SSno mod 8 h2(130326734) = 6 -> 110h3(DName) = L(DName) mod 4 h3(XYZ55) = 1 -> 01
B-addr. = 101111001
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Multi-Key Hashing (2)
Number of accessesExact-match queries: access to 1 BucketPartial-match queries:Eno = 58651 NB = 29/24 = 32Eno = 73443 AND SSno = 2332 NB = 29/24+3 = 4
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
Eno 73443 AND SSno 2332 NB 2 /2 4
Pros• No index, little space requirements and update overhead• Exact-match queries: entire address is known access to 1 bucket• Partial-match query : pruning of search space
(Ai = ai): number of buckets to be searched is reduced by
ib2
ii bBbBB 22/2N
© 2011 AG DBIS
Applications ofGeo-DBS
9-18
Cons / problems• Topological structure of data is not preserved• No support of range- and best-match queries• Optimal allocation of bi to Ai dependent on query frequencies
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Evaluation of Approaches
Comparison Quadtree Multi-KeyHashing k-d Tree Grid File R-Tree
Dynamicreorganization
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
reorganization
Preservation of topological structure (cluster)
Determination of neighborhood: range queries, best match
© 2011 AG DBIS
Applications ofGeo-DBS
9-19
Separation of accessstructure and data
Exact query(balanced structure)
Object representation
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Multi-Dimensional Binary Search Trees Extension of the binary tree
• Consideration of k keys: k-d tree• All data records are organized using the
tree structure: node-oriented (homogeneous)• Maintenance operations as in the case of the
binary search tree but
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
binary search tree, but
• Key comparison applies for one of the k keys at each level
Discriminator determines key at each level • Cyclic variation of discriminator d: for all nodes of tree level i holds: d = (i mod k) + 1• Left (right) successor to a node contains all records having smaller (larger) values for the
discriminator attribute
(P)K (R)K :HISON(P) R(P)K(Q)K :LOSON(P)Q
dd
dd
© 2011 AG DBIS
Applications ofGeo-DBS
9-20
Tree structure is not balanced – various proposals for extensions• To a record, m-1 succeeding levels of the tree are completely mapped into a page
- page must be able to store up to 2m - 1 records
- each record at level 1, m+1, 2m+1, ... forms root of a subtree stored in a page
Further variants of the k-d tree• Leaf-oriented (heterogeneous, storage of records in buckets)
dd
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Example: 3-d Tree (Node-Oriented)
Age Salary Location
352829254029
17K40K15K45K12K16K
KLFDAKLSBB
discr.
35 17K KL 1
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
29304225
16K17K100K14K
BFFB
P i f t ?
Age Salary Loc.
28 40K F 2
29 15K DA 3 25 45K KL 3
40 12K SB 2
42 100K F 3
© 2011 AG DBIS
Applications ofGeo-DBS
9-21
- Processing of query types?
- Tree structure dependent on insertion sequence (unbalanced tree)
- Pruning of search space for partial-match- and range queries
- Deletion is very difficult
29 16K B 1 30 17K F 1
25 14K B 2
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Organization of Embedding Data Space –Divide and Conquer
Decomposition principle of D• D is dynamically partitioned in cells, objects of a cell are stored as records in buckets • In case of bucket overflow: local cell refinement
Divide and Conquer• Piecewise preservation of topology (clustering)• Tree as access structure for the buckets only works as directory
E l H t k d t k 2
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
Example: Heterogeneous k-d tree – k=2
XXX X
60
20
50 70
X
XX
A2=Y
A1=X
100
0 100
D
b = 3X
x = 50 1
X
y = 20 2y = 60 2
x = 70 1
buckets
© 2011 AG DBIS
Applications ofGeo-DBS
9-22
50 70 A1 X0 100
Properties of k-d tree• Clustering by buckets is prerequisite for practical use• Built-in balancing mechanism does not exist• How are modification operations (deletion!) done?• How are the various query types supported?
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Organization of Embedding Data Space –Dimension Refinement
Principle• Data space D is dynamically partitioned by an orthogonal grid such that
k-dimensional cells (grid blocks) emerge• The objects contained in the cells are stored in buckets • Therefore, a cell is uniquely allocated to a bucket
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
• Class-building property: principle of dimension refinement which refines a segment in the selected dimension by a complete cut through D
ExampleA3
w2
w1
u1
data space D = A1 x A2 x A3cell partition P = U x V x Wsegments of partition U = (u1, u2, … ul)
V = (v1, v2, …, vm)W = (w1, w2, …, wn)
© 2011 AG DBIS
Applications ofGeo-DBS
9-23
A2u1
u2
u3
A1
v1 v2 v3
v2 v3 v4
Three-dimensional data space D with cell partition P; visualization of a split event in interval v2
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Dimension Refinement
A2
w2
w1
u1
u2
A3
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
Problems of dimension refinement• How many new cells emerge each time?
u2
u3
A1
v1 v2 v3
v2 v3 v4
© 2011 AG DBIS
Applications ofGeo-DBS
9-24
• What is the consequence for bucket allocation?
• Which mapping methods can be chosen?
• Are there restrictions for the determination of dimension refinement?
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Grid-File: Idea
DB k t
Goals• Preservation of topology• Efficient support of all query types• Reasonable storage occupancy
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
X
XX X
X
X
GD
Buckets b=3
© 2011 AG DBIS
Applications ofGeo-DBS
9-25
X
X
X
X
XX
X
X
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Grid-File: Idea (2)D
GD
BucketsX
X
X
X
X
X X
X
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
X
X X X
X X
X
XXX
X
X
X
© 2011 AG DBIS
Applications ofGeo-DBS
9-26
Properties• 1:1 relationship between cell Ci and element of GD• Element of GD = Ptr. to bucket B• n:1 relationship between Ci and B
X
XX
XX X
X
X
Ci
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Grid-File2: Idea (3)
Decomposition principle of D: dimension refinement
X
Dn
GD
Buckets
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
X
X
X X
X
XC
0
0 mS1
S2
X X
XX
XX X
B
© 2011 AG DBIS
Applications ofGeo-DBS
9-272. Nievergelt, J. et al.: The Grid File: An Adaptable, Symmetric Multikey File Structure, ACM Trans. Database System, 1984, pp. 38-71
Components• k scales (scaling vectors) define cells (grid) in the k-dimensional data space D• Cell- or grid directory GD: dynamic k-dim. matrix for the mapping of D onto the
set of buckets• Bucket (B): storage of objects of one cell (C) or several cells
(bucket can cover a range of cells: bucket region))
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Grid File - Example Stepwise development of a GF9 D
X X
X
X
A B
GD
1
situation aX
XX
X
A
B
Buckets
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
0a t z
A B1
1 2S2
S1
b=3
9
0a t
D
X X
X
X
z
A C
GD
1
1 2S2
S1
X XX
B
3g
situation bX
XX
X
A
B
Buckets
X
X
X
C
© 2011 AG DBIS
Applications ofGeo-DBS
9-28
S1
Buckets9
0a t
D
X X
X
X
zA C
GD
1
1 2S2
S1
X XB
3
A C2 D
X
X
7
g
situation c
X
XX
X
A
B
X
X
X
C
DX
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Central Data Structure: Grid Directory Requirements
• Principle of two disk accessesindependent of value distributions, operation frequencies and number of stored records
• Split- and merge operations only on two buckets at a time• Storage occupancy
A f b k t t bit il ll
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
- Average occupancy of buckets not arbitrarily small - Skewed distributions only enlarge G
b=2D
© 2011 AG DBIS
Applications ofGeo-DBS
9-29
Design of a directory structure• Dynamic k-dim. matrix GD (on disk)• k one-dim. vectors Si (in memory)
xx
xx xx
xx
xxxxxxxx
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Central Data Structure: Grid Directory (2)
Operations on GD• Direct access to a GD entry• Relative access (NEXTABOVE, NEXTBELOW)• Merge of 2 neighbored entries of a dimension (renaming of affected entries)• Splitting of an entry of a dimension (with renaming)
B h d i t f ll t b k t
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree ce
llsbu
cket
po
ol
Box-shaped assignment of cells to buckets
© 2011 AG DBIS
Applications ofGeo-DBS
9-30
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Grid File – Search Queries
Exact match querySELECT *
FROM EMPWHERE Loc=‘KL‘AND Dno = ‘K55‘
[ . . .]B
S1: AA, DA, FR, MK, ZZ S2: K00, K17, K39, K52, K89, K99
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
[ KL, K55]
[ . . .]
GD pageGD
Range query• Determination of scale values
in each dimension
© 2011 AG DBIS
Applications ofGeo-DBS
9-31
• Computation of qualified GD entries• Access to the GD page(s) and
fetch of the referenced buckets
range query
D
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Bucket Regions (400 Insertions, b = 20)
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
© 2011 AG DBIS
Applications ofGeo-DBS
9-32Uniform distribution Non-uniform distribution
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Grid File – Performance Considerations
Growth of GD is super-linear• In case of uniform object distributions:
O(N1+(k-1)/(k*b))
• In case of skewed object distributions up to:
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
Refinement in dimension k• Number of added GD entries: n = (m1*m2*...*mk-1) • Required disk accesses (E = #GD entries):
Storage occupancy for buckets:
Problem cases for the mapping:
© 2011 AG DBIS
Applications ofGeo-DBS
9-33
Problem cases for the mapping:• Low selectivity and unfavorable distribution
S1 : a, d, d, d, f, s, ...S2 : 0, 5, 5, 5, 7, 9, ...
• How many GD entries can be delivered by an exact query ?
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Access Paths for Expanded Spatial Objects Expanded objects own
• General properties such as name, composition, . . .• Location and geometry (curve, polygon, . . .)
Indexing of spatial objects• Exact representation?• Object approximation by box-shaped envelopment – effective!
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
Object approximation by box shaped envelopment effective! however, false drops (mismatches) are possible
Problems• Besides object density, the object expansion has
to be regarded when mapping and refining the object• Objects can contain others or can mutually overlap
Classification of solution approaches• Mutually overlapping regions (R-tree)• Clipping (R+-tree)• Transformation approach
© 2011 AG DBIS
Applications ofGeo-DBS
9-34
Transformation approach
functionally maps expanded objects onto higher-dimensional points
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
R-Tree3
Goal: Efficient management of spatial objects (points, polygons, cuboids, ...)
Applications• Cartography: storage of geographic maps, efficient evaluation of
“geometrical” queriesCAD i l ti f f l d lid ( t l i f
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
• CAD: manipulation of faces, volumes and solids (e.g. rectangles in case of VLSI design )
• Computer Vision and Robotics
Key operations• Point queries: Find all objects which contain a given point • Region queries: Find all objects which overlap with a given search window
(... enclose, ... are completely contained in)
© 2011 AG DBIS
Applications ofGeo-DBS
9-35
Approach: Storage and search of axis-parallel rectangles• Objects are represented by data rectangles and must be described by
Cartesian coordinates • Representation in the R-tree happens by minimally enclosing
(k-dimensional) rectangles/regions• Search queries also refer to rectangles/regions
3. Guttman, A.: R-Trees: A Dynamic Index Structure for Spatial Searching, in: Proc. ACM SIGMOD Conf., 1984, pp. 47-57
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
R-Tree (2)
R-tree is height-balanced multi-way tree• Each node corresponds to a page• Maximally M, minimally m (>= M/2) entries per node
leaf node entryleaf node entry
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
l1 l2 . . . lk TID
l1 l2 . . . lk PID
smallest circumlocutory rectangle (data rectangle) for TID
intervals describe smallest circumlocutory data region for all objects contained in PID
ea ode e t y
intermediate node entry
ea ode e t y
© 2011 AG DBIS
Applications ofGeo-DBS
9-36
lj = closed interval regarding dimension j
TID: reference to object PID: reference to son
Properties• Strong overlap of circumlocutory rectangles/regions possible on all tree levels• Search for rectangles/regions has to traverse several subtrees, if necessary• Maintenance operations similar to such in B-trees
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Mapping of the R-Tree
Division of data space D D
E
F G
H
K
J
I
B
A
d
datarectangle
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
N M L
C
data region
R-tree A BA C
H I JD E F L M NG K
Example for a
© 2011 AG DBIS
Applications ofGeo-DBS
9-37
Example for a “bad” search window
DE
FG
N M L
H
K
J
I
B
C
A
W
search window W
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
R-Tree: Example Face objects to be stored 8
7
6
5
4
F11 F10
F8
F7
F9 F6
F15
F12
F13
F16
F17
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
3
2
1
0 1 2 3 4 5 6 7 8 9 10
F5
F4
F3F14
F1F2
0-80-4
6-100-8
0-82-8
0-30-4
2-80-4
7-100-5
6-103-8
2-62-8
2-86-8
0-33-8
Related R-tree
© 2011 AG DBIS
Applications ofGeo-DBS
9-38
0-30-2F1
0-31-4F14
2-60-3F2
4-80-4F3
8-101-5F5
7-100-2F4
7-92-5F17
6-73-8F8
7-105-8F6
4-64-8F9
2-62-6F15
3-54-6F16
0-34-7F12
0-36-8F11
0-33-6F13
6-86-8F7
2-66-8F-10
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Search Optimization by the R+-Tree
Coverage and overlap for a level of the R-tree• Coverage is the entire area to cover all related rectangles • Overlap is the entire area which is contained in two or more nodes • Minimal coverage reduces the amount of “dead space” (empty area) which
is covered by the nodes of the R-tree.• Minimal overlap reduces the set of search paths to the leaves
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
Minimal overlap reduces the set of search paths to the leaves (even more critical for the access time than minimal coverage)Efficient search requires minimal coverage and overlap
IdeaPartitions are allowed which “cut up” data rectangles (Clipping)
avoidance of overlaps in intermediate nodes
ConsequenceData rectangle is decomposed in a collection of disjoint sub-rectangles, if necessary and is stored in various nodes at the leaf level
© 2011 AG DBIS
Applications ofGeo-DBS
9-39
necessary, and is stored in various nodes at the leaf level
Partitioning option of a long line object in the R+-tree
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
R+-Tree4
Division of data space D
DE
F G
H
K
J
I
B
A P higher flexibility when data rectangles can be partitioned
4. Sellis, T. et al.: The R*-Tree; A Dynamic Index for Multi-Dimensional Objects,in: Proc. 14th Int. Conf. on Very Large Data Bases, Brighton, 1987, pp. 507-518
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
N M L
C
R+-treeA BA C
I J KD E F G HG
P
L M N
ecta g es ca be pa t t o ed
© 2011 AG DBIS
Applications ofGeo-DBS
9-40
Properties• Overlap of data regions is avoided • Coverage problem is essentially eased• Clustering of objects combined in a region may be prevented by clipping • More complex algorithms for certain queries (containment)• More difficult maintenance, no performance advantage as compared to R-tree
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Applications of Geo-DBS What are Geo-objects?
Representation of a certain area of earth surface
• Geometric component (location, expansion)• Thematic component
(qualitative or quantitative description)
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
(q q p )
Modeling• Topological base elements: point, line, face• Line-shaped formations: polygon draws• Plane objects: grid model, vector model• Vector model is favored for Geo-DBS Good scalability, lower space requirement• Object class for surface modeling: simple polygons with holes
Queries
© 2011 AG DBIS
Applications ofGeo-DBS
9-41Examples of location-related and range-related selection
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Applications of Geo-DBS Use of object approximations
• Two-level search and test: in index and on object representation• Different conservative approximations
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
• Empirical result: minimally enclosing pentagon gives best trade-off between approximation quality and space consumption
Thematic component• Simple data types contained in an attribute vector (tuple)• Example: surface usage, amount of rainfall
rectangle convex envelope pentagon ellipse
© 2011 AG DBIS
Applications ofGeo-DBS
9-42
Characteristics of objects• Number of objects: up to 109 data records• Data volume: up to 1 TeraByte• Variability of objects and object sets
- Object expansion: 1 : 106
- Object form- Storage space requirement of polygons: 0.5 KB – 1 GB- Distribution of objects in the plane (object density): 1 : 104
There are no upper boundaries for these properties
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Queries to Spatial Objects –Extensibility Aspects of the DB Language
No space relationship and no spatial operators in the relational model• High complexity even for simple examples• Example: representation of arbitrarily mounted rectangles in the plane
RECTANGLE (RE-No, X1, Y1, X2, Y2, X3, Y3, X4, Y4)
Q Fi d ll t l th t t l l t l ith i t (PA PB PC PD)
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
Query: Find all rectangles that truly enclose rectangle with points (PA, PB, PC, PD)
SELECT RE-No FROM RECTANGLEWHERE XA > X1 AND YA < G(P1, P2, XA)
AND YA > G(P4, P1, XA) ANDXB < Y2 AND YB < G(P1, P2, XB)
AND XB < G(P2, P3, XB) ANDXC < X3 AND YC < G(P2, P3, XC)
AND YC > G(P3, P4, XC) ANDYD > Y4 AND YD > G(P3, P4, XD)
AND YD > G(P1, P4, XD);
© 2011 AG DBIS
Applications ofGeo-DBS
9-43
with abbreviations: e.g. G(P1, P2, XA) for((Y1 - Y2) * XA) / (X1 - X2) + (Y2 * X1 - Y1 * X2) / (X1 - X2)
Object-relational data models – key property• Extensibility with user-defined functions and operators• Queries using operators such as overlap, intersect, distance, containment («) ...
SELECT RE-No FROM RECTANGLE XWHERE [PA, PB, PC, PD] « X;
Realizationof DBS
Quad trees
Basic problems
Classification –access pathsand queries
Summary
Important properties of multi-dimensional access paths• Preservation of topological structure of data space • Adaptation to the object density• Dynamic reorganization• Balanced access structure• Support of different query types (intersection queries best match queries)
Grid file
Multi-keyhashing
k-d trees
R- and R+-tree
• Support of different query types (intersection queries, best match queries)
Representation of spatial objects• Abstraction to point-shaped representation is typical • “Expanded” representation only in coarse approximation:
Processing problems
Pros of newer concepts• Organization of embedding data space – Grid file• Representation of expanded objects – R- and R+-tree
However: There exist very many proposals and concepts which are
© 2011 AG DBIS
Applications ofGeo-DBS
9-44
However: There exist very many proposals and concepts which are extremely specialized and not practically proven
Necessity of DBS integration of “conventional”(one-dimensional) and multi-dimensional access paths • Access options via spatial and time-related dimensions • Support of “Data Warehouse” appls, geographical information systems, . . .
Extension of object-relational DBS by multi-dimensional access paths (primarily Grid file and R-tree)