Upload
sheryl-chapman
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Answering Top-k Queries Using Views
Gautam Das (Univ. of Texas),
Dimitrios Gunopulos (Univ. of California Riverside), Nick Koudas (Univ. of Toronto),
Dimitris Tsirogiannis (Univ. of Toronto)
VLDB '06
Introduction
Preferences expressed as scoring functions on the attributes of a relation, e.g
tid X1 X2 X3
1 82 1 59
2 53 19 83
3 29 99 15
4 80 45 8
5 28 32 39€
fQ
tid Score
2 612
1 543
4 370
3 360
5 343
Top-k: k tuples with the highest score
€
fQ = 3X1 + 2X2 + 5X3
R
VLDB '06
Related Work
TA [Fagin et. al. ‘96] Deterministic stopping condition Always the correct top-k set
PREFER [Hristidis et. al. ‘01] Stores multiple copies of base relation R Utilizes only one
We complement existing approaches
VLDB '06
Motivation Query answering using views Space-Performance tradeoff Improved efficiency Can we exploit the same tradeoffs for
top-k query answering?
VLDB '06
Problem Statement
€
fQ = 3X1 + 2X2 + 5X3V1 tid Score
3 553
4 385
5 216
2 201
1 169
€
fV1 = 2X1 + 5X2
V2 tid Score
2 351
1 237
5 177
3 159
4 88
€
fV 2 = X2 + 4X3
R tid X1 X2 X3
1 82 1 59
2 53 19 83
3 29 99 15
4 80 45 8
5 28 32 39
Ranking Views: Materialized results of previously asked top-k queriesProblem: Can we answer new ad-hoc top-k queriesefficiently using ranking views?
VLDB '06
Outline LPTA Algorithm View Selection Problem
Cost Estimation Framework View Selection Algorithms
Experimental Evaluation Conclusions
VLDB '06
LPTA - Setting Linear additive scoring functions e.g.
Set of Views: Materialized result of a previously executed
top-k query Arbitrary subset of attributes Sorted access on pairs
Random access on the base table R
€
tid,scoreQ tid( )( )
€
fQ = 3X1 + 2X2 + 5X3
VLDB '06
LPTA - Example
€
tid11
€
s11
€
tid21
€
tid31
€
tid41
€
tid51€
s21
€
s31
€
s41
€
s51
€
tid12
€
s12
€
tid22
€
tid32
€
tid42
€
tid52€
s22
€
s32
€
s42
€
s52
V1 V2
€
tid11
€
tid12
Top-1V1
V2
Qstoppingcondition
X1
X2
R(X1, X2)
€
O = (0,0)
€
P = (1,0)
€
R = (1,1)
€
T = (0,1)
VLDB '06
LPTA
Linear Programming adaptation of TA
€
R(X1,X2)
€
tidd1
€
sd1
€
tidd2
€
sd2
€
max( fQ )
0 ≤ X1,X2 ≤1
2X1 + 5X2 ≤ sd1
X2 + 2X2 ≤ sd2
€
unseenmax ≤ topkmin
€
fV1 = 2X1 + 5X2
€
fV 2 = X1 + 2X2
Q:
€
fQ = 3X1 +10X2
V1 V2
d iteration€
tid
€
Score
€
tid
€
Score
VLDB '06
LPTA - Example (cont’)
€
tid11
€
s11
€
tid21
€
tid31
€
tid41
€
tid51€
s21
€
s31
€
s41
€
s51
€
tid12
€
s12
€
tid22
€
tid32
€
tid42
€
tid52€
s22
€
s32
€
s42
€
s52
V1 V2
€
tid11
€
tid12
€
tid21
€
tid22
Top-1 V1
V2
Qstoppingcondition
X1
X2
R(X1, X2)
€
O = (0,0)
€
P = (1,0)
€
R = (1,1)
€
T = (0,1)
VLDB '06
LPTA Algorithm View Selection Problem
Cost Estimation Framework View Selection Algorithms
Experimental Evaluation Conclusions
Outline
VLDB '06
View Selection Problem Given a collection of views
and a query Q, determine the most efficient subset to execute Q on.
Conceptual discussion Two dimensions Higher dimensions
€
V = {V1,K ,Vr}
€
U ⊆V
VLDB '06
View Selection - 2d
A
B
Min top-k tupleQV1
V2
€
O = (0,0)
€
T = (0,1)
€
P = (1,0)
€
R = (1,1)
€
X
€
YA1
B1
M
VLDB '06
View Selection - Higher d
Theorem: If is a set of views for an -dimensional dataset and Q a query, the optimal execution of LPTA requires a subset of views such that .
Question: How do we select the optimal subset of views?
€
V = {V1,K ,Vr}
€
U ⊆V
€
U ≤ m€
m
VLDB '06
Outline LPTA Algorithm View Selection Problem
Cost Estimation Framework View Selection Algorithms
Experimental Evaluation Conclusions
VLDB '06
Cost Estimation Framework What is the cost of running LPTA when a
specific set of views is used to answer a query?
Cost = number of sequential accesses
Cost = 6 sequential accesses
Min top-k tuple
Can we find that costwithout actually running LPTA?
A
B
QV1
V2
VLDB '06
Simulation of LPTA on Histograms
1. Use HQ to estimate the score of the k highest tuple (topkmin).
2. Simulate LPTA in a bucket by bucket lock step to estimate the cost.
HQ HV1 HV2
topkmin
HQ: approximates the scoredistribution of the query Q
b bucketsn/b tuples per bucket
Cost
VLDB '06
Outline LPTA Algorithm View Selection Problem
Cost Estimation Framework View Selection Algorithms
Experimental Evaluation Conclusions
VLDB '06
View Selection Algorithms Exhaustive (E): Check all possible
subsets of size , . Greedy (SV): Keep expanding the set of
views to use until the estimated cost stops reducing.
€
p ≤ m
€
pr
( )
VLDB '06
Requires the solution of a single linear program.
€
(0,1)
€
(1,0)
€
(0,0)
€
fV j ≤ s
Q Selected Views
€
s
€
s€
s
€
s
€
s
€
max( fQ )
Select Views Spherical (SVS)
T
VLDB '06
Select Views By Angle (SVA)Select Views By Angle (SVA): Sort the views by
increasing angle with respect to Q.
€
(0,1)
€
(1,0)
€
(0,0)
QSelected Views
V1
V2V3V4
€
ϕ1
€
ϕ 2
€
ϕ 3
€
ϕ 4
€
ϕ1 <ϕ 2 <ϕ 3 <ϕ 4
VLDB '06
General Queries and Views Views that materialize their top-k tuples.
Truncate the view histograms. Accommodating range conditions
Select the views that cover the range conditions.
Truncate each attribute’s histogram.
VLDB '06
Outline LPTA Algorithm View Selection Problem
Cost Estimation Framework View Selection Algorithms
Experimental Evaluation Conclusions
VLDB '06
Experiments Datasets (Uniform, Zipf, Real) Experiments:
Performance comparison of LPTA, PREFER and TA
Accuracy of the cost estimation framework Performance of LPTA using each of the
view selection algorithms Scalability of the LPTA algorithm
VLDB '06
Performance comparison of LPTA, PREFER and TA
Uniform dataset, 3dReal dataset, 2d
VLDB '06
Cost Estimation Accuracy
(buckets = 0.5% of n) (buckets = 1% of n)2d
VLDB '06
Performance of LPTA using View Selection Algorithms
(2d) (3d)500K tuples, top-100
VLDB '06
Scalability Experiments on LPTA
(2d, uniform dataset) (500K tuples, top-100)
VLDB '06
Conclusions Using views for top-k query answering LPTA: linear programming adaptation of
TA View selection problem, cost estimation
framework, view selection algorithms Experimental evaluation
VLDB '06
(Thank You!)
Questions?