View
58
Download
2
Category
Preview:
DESCRIPTION
Database Group. A Cost Model for Interval Intersection Queries on RI-Trees. SSDBM 2002 , Edinburgh. Hans-Peter Kriegel, Martin Pfeifle , Marco Pötke, Thomas Seidl. Institute for Computer Science University of Munich, Germany. Outline of the Talk. 1. Introduction 2. RI-Tree 3. Cost Model - PowerPoint PPT Presentation
Citation preview
Hans-Peter Kriegel, Martin Pfeifle, Marco Pötke, Thomas Seidl
A Cost Model for Interval Intersection Queries on RI-Trees
Institute for Computer ScienceUniversity of Munich, Germany
Database Group
SSDBM 2002, Edinburgh
07/25/02 Martin Pfeifle
Outline of the Talk
1. Introduction
2. RI-Tree
3. Cost Model
4. Evaluation
5. Conclusions and Future Work
07/25/02 Martin Pfeifle
Extended Objects in Databases
t
1D Objects:• Temporal data• Approximate values• Interval constraints• …
2D Objects:• Geographic data• VLSI design• Bitemporal data• …
3D Objects:• CAD documents• Digital mockup• Haptic rendering• …
t
Interval query
Box query
Window query
07/25/02 Martin Pfeifle
Integration of Access Methods
Declarative EmbeddingObject-relational DML and DDL
Extensible Indexing Framework
query processing
index_open()index_fetch()index_close()
maintenance
index_create()index_drop()index_insert()index_delete()index_update()
07/25/02 Martin Pfeifle
Integration of Access Methods
Extensible Indexing Framework
Declarative EmbeddingObject-relational DML and DDL
Physical ImplementationBlock-Manager, Caches, Locking, Logging, …
User-defined Index Structure
Extensible Indexing FrameworkObject-relational interface for indexmaintenance and querying functions.
Relational ImplementationMapping to built-in indexes (B+-trees);SQL-based query processing
07/25/02 Martin Pfeifle
Integration of Access Methods
User-defined Index Structure
Extensible Indexing FrameworkObject-relational interface for indexmaintenance and querying functions.
Relational ImplementationMapping to built-in indexes (B+-trees);SQL-based query processing
Physical ImplementationBlock-Manager, Caches, Locking, Logging, …
Declarative EmbeddingObject-relational DML and DDL
Extensible Optimization Framework
optimization
stats_collect()stats_delete()predicate_sel()index_io_cost()
07/25/02 Martin Pfeifle
Integration of Access Methods
User-defined Index Structure
Extensible Indexing FrameworkObject-relational interface for indexmaintenance and querying functions.
Relational ImplementationMapping to built-in indexes (B+-trees);SQL-based query processing
Physical ImplementationBlock-Manager, Caches, Locking, Logging, …
User-defined Cost Model
Extensible Optimization FrameworkObject-relational interface for selectivity estimation and cost prediction functions.
Relational ImplementationMapping to built-in statistics facilities;SQL-based evaluation of cost model
Declarative EmbeddingObject-relational DML and DDL
07/25/02 Martin Pfeifle
Outline of the Talk
1. Introduction2. RI-Tree
3. Cost Model
4. Evaluation
5. Conclusions and Future Work
07/25/02 Martin Pfeifle
3a 15a12c
5c 15a
Relational Interval Tree (RI-Tree)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
15
8
1 3 5 7 13119
2 6 10 14
4 12
alicebob
chrisdave
7b1b
13d 13d
Foundation: Interval Tree [Edelsbrunner 1980]
• primary structure: binary search tree on possible endpoints• secondary structure: sorted lists of stored endpoints
each interval is registered at exactly one node
[Kriegel, Pötke, Seidl VLDB 2000]
07/25/02 Martin Pfeifle
15
8
1 3 5 7 13119
2 6 10 14
4 12
RI-Tree: Virtual Primary Structure
151 3 5 7 13119
2 6 10 14
4 12
8
root = 2h–1
1 2h – 1
• no materialization of the binary tree• storage cost O(1): parameter root• fixed data space: root = 2h–1 covers [1..2h – 1]
first step: virtualize the primary structure
07/25/02 Martin Pfeifle
RI-Tree: Relational Secondary Structure
8
151 3 5 7 13119
2 6 10 14
4 123a 15a12c
5c 15a7b1b
13d 13d
second step: manage secondary structure by two B+-trees
• storage of n intervals: O(n/b) disk blocks of size b• insert and delete: O(logbn) disk block accesses in the indexes
node lower
488
13
135
13
id
bacd
node upper
488
13
7121513
id
bcad
lowerIndex(node,lower,id)
upperIndex(node,upper,id)
07/25/02 Martin Pfeifle
upper = 2522 = lower
RI-Tree: Interval Intersection Query
16 = root
24 = fork
25
2826
20
22
1
23
4
h = 5
23
07/25/02 Martin Pfeifle
upper = 2522 = lower
RI-Tree: Interval Intersection Query
16 = root
20
1
23
4
h = 5
select id from upperIndex i, leftNodes leftwhere i.node = left.node and i.upper >= .lower
07/25/02 Martin Pfeifle
upper = 2522 = lower
RI-Tree: Interval Intersection Query
24 = fork
2522
1
23
4
h = 5
select id from upperIndex i, leftNodes leftwhere i.node = left.node and i.upper >= .lowerunion allselect id from upperIndex iwhere i.node between .lower and .upper
23
07/25/02 Martin Pfeifle
upper = 2522 = lower
RI-Tree: Interval Intersection Query
2826
1
23
4
h = 5
select id from upperIndex i, leftNodes leftwhere i.node = left.node and i.upper >= .lowerunion allselect id from upperIndex iwhere i.node between .lower and .upperunion allselect id from lowerIndex i, rightNodes rightwhere i.node = right.node and i.lower <= .upper
07/25/02 Martin Pfeifle
upper = 2522 = lower
RI-Tree: Interval Intersection Query
16 = root
24 = fork
25
2826
20
22
1
23
4
h = 5
I/O complexity:
O(h·logbn + r/b)
select id from upperIndex i, leftNodes leftwhere i.node = left.node and i.upper >= .lowerunion allselect id from upperIndex iwhere i.node between .lower and .upperunion allselect id from lowerIndex i, rightNodes rightwhere i.node = right.node and i.lower <= .upper
23
07/25/02 Martin Pfeifle
Outline of the Talk
1. Introduction2. RI-Tree
3. Cost Model
4. Evaluation
5. Conclusions and Future Work
07/25/02 Martin Pfeifle
joinI/O(T,) = min ,( )( )
1 gg Gaps
Gapsleft
right
Gapsleft()
Gapsright()
root
root
I/O Cost Model for Interval Intersections
1234
h = 5T
upperIndex(node, upper, id)
lowerIndex(node, lower, id)
B
root
outputI/O(T,) = (T,)·BO( h·logbn + r/b )
07/25/02 Martin Pfeifle
Selectivity Estimation
• Histogram-based:(equi-width histogram)
( , )( , )
,Toverlap bucket
n nii
ii
i
1 1
1
01
– replication of intervals intersection multiple buckets– statistics management requires user-defined code
• Quantile-based:(equi-count histogram)
where ,10n
TrTrTrT rightinnerleft ,
),(),(),(),(
r T
overlap l l q q overlap l T l q q
q qn
left
i i upper i i
i ii
( , )min , , , , ( ), , ,
;
1 1
11
+ better adaption to the data distribution+ exploits built-in statistics of the ORDBMS
r T
overlap q qq q
ninner
i i
i ii
( , ), ,
;
1
11
r Tright ( , ) analogously to rleft
07/25/02 Martin Pfeifle
Outline of the Talk
1. Introduction2. RI-Tree
3. Cost Model
4. Evaluation
5. Conclusions and Future Work
07/25/02 Martin Pfeifle
Experimental EvaluationDatasets
0,0
0,6
1,2
1,8
0 200 401 602 802bucket (low er bound)
[x 1.000]
num
ber o
f int
erva
ls
[x 1
.000
]
database = UNIdb size = 100,000 intervalsresolution = 100 buckets
0
2
4
6
8
10
62 232 402 572 741bucket (low er bound)
[x 1.000.000]
num
ber o
f int
erva
ls [x
1.0
00]
database = REALdb size = 115,000 intervalsresolution = 100 buckets
UNI REAL
07/25/02 Martin Pfeifle
Experimental Evaluation Computation of Statistics
0
10
20
30
40
100.000 200.000 300.000 400.000database s ize [#intervals ]
real
tim
e [s
ec]
histograms
quantiles
database = UNIresolution = 100 buckets/quantiles
07/25/02 Martin Pfeifle
Experimental EvaluationSelectivity Estimation
0
2
4
6
8
0 25 50 75 100query s ize [%]
rela
tive
erro
r [%
]
histograms
quantiles
database = UNIdb size = 100,000 intervalsresolution = 100 buckets/quantiles
0
1
2
3
4
0 25 50 75 100query s ize [%]
rela
tive
erro
r [%
]
quantiles
histograms
database = REALdb size = 115,000 intervalsresolution = 100 buckets/quantiles
UNI REAL
07/25/02 Martin Pfeifle
Experimental EvaluationSelectivity Estimation
0
5
10
15
0 100 200 300 400resolution [buckets /quantiles ]
rela
tive
erro
r [%
]
quantiles
histograms
database = REALquery size = 10%
07/25/02 Martin Pfeifle
Experimental EvaluationCost Estimation
0
5
10
15
0 25 50 75 100query s ize [%]
rela
tive
erro
r [%
]
database = UNIdb size = 100,000 intervalsresolution = 100 quantiles
0
5
10
15
20
25
0 25 50 75 100query s ize [%]
rela
tive
erro
r [%
]
database = REALdb size = 115,000 intervalsresolution = 100 quantiles
UNI REAL
07/25/02 Martin Pfeifle
Experimental EvaluationCost Estimation
0
1
2
3
4
5
0 10 20 30 40 50 60 70 80 90 100
actual se lectivity [%]
phys
ical
read
s
[x 1
00 b
lock
s]
est. output I/O est. join I/O est. total I/O real join I/O real total I/O
database = 1D-UNIdb size = 100,000 intervalsresolution = 100 quantiles
0
1
2
3
4
5
0 10 20 30 40 50 60 70 80 90 100
actual se lectivity [%]
phys
ical
read
s
[x 1
00 b
lock
s]
est. output I/O est. join I/O est. total I/O real join I/O real total I/O
database = REALdb size = 115,000 intervalsresolution = 100 quantiles
UNI REAL
07/25/02 Martin Pfeifle
Outline of the Talk
1. Introduction2. RI-Tree
3. Cost Model
4. Evaluation
5. Conclusions and Future Work
07/25/02 Martin Pfeifle
Conclusions and Future Work
• Relational access methods:– employ an ORDBMS as virtual machine– extensible indexing and optimizing framework
• Indexing extended objects:– Relational Interval Tree
• Development of cost models:– estimation of selectivity and I/O cost
Conclusions:
Future Work: • Cost models:– general interval relationships– interval sequences
07/25/02 Martin Pfeifle
Any questions?
? ?
?
??
?
?
Recommended