Upload
faith-winham
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
ICICLES: SELF-TUNING SAMPLES FOR APPROXIMATE QUERY ANSWERING
BY VENKATESH GANTI, MONG LI LEE, AND RAGHU RAMAKRISHNAN
CSE6339 – DATA EXPLORATION
Raghavendra Madala
ICICLES: Self-tuning Samples for Approximate Query
2
In this presentation… Introduction
Icicles
Icicle Maintenance
Icicle-Based Estimators
Quality Guarantee
Performance Evaluation
Conclusion
ICICLES: Self-tuning Samples for Approximate Query
3
Introduction
Analysis of data in data warehouses useful in decision support
• OLAP-provide interactive response times to aggregate queries
• AQUA- Approximate query answering systems provide very fast alternatives to OLAP systems
ICICLES: Self-tuning Samples for Approximate Query
4
Approaches
• Sampling-based• Histogram-based• Probabilistic-based• Wavelet-based• Clustering-based
ICICLES: Self-tuning Samples for Approximate Query
5
Join synopsis
Is a Uniform Random Sampling• All tuples are assumed to be equally important• OLAP queries follow a predictable repetitive
pattern• Sampling wastes precious main-memory• Join of random samples of base relations may
not be a random sample of the join of the base relations. This is basis for Join Synopsis by Gibbons
ICICLES: Self-tuning Samples for Approximate Query
6
Why Icicles?
• To capture the data locality of aggregate queries on foreign key joins
• Is expected to consist of more tuples in regions that are accessed more frequently
• Sample relation space better utilized if more samples from actual result set are present
• Dynamic algorithm that changes the sample to suit the queries being executed in the workload
ICICLES: Self-tuning Samples for Approximate Query
7
Icicles Is a uniform random sample of a
multiset of tuples L (an extension of R), which is the union of a relation R and all sets of tuples that were required to answer queries in the workload
ICICLES: Self-tuning Samples for Approximate Query
8
Icicle Maintenance
The intuition is to incrementally maintain a sample, called icicles.
We maintain an icicle such that the probability of a tuple being selected is proportional to frequency with which it is required to answer queries(exactly).
ICICLES: Self-tuning Samples for Approximate Query
9
Icicle Maintenance Algorithm
Efficient incremental maintenance is possible for the the following reasons• Uniform Random Sample of L(extension of
relation R) ensures that tuple’s selection in the icicle is proportional to it’s frequency
• Incremental maintenance of icicle requires only the segment of R that satisfies the new query each time
• Reservoir Sampling Algorithm is used to stream each tuple being appended to L.
ICICLES: Self-tuning Samples for Approximate Query
10
Icicle Maintenance Algorithm
ICICLES: Self-tuning Samples for Approximate Query
11
Icicle Maintenance Example
ICICLES: Self-tuning Samples for Approximate Query
12
Icicle-Based Estimators
• Icicle is a non-uniform sample of original data
• Frequency must be maintained over all tuples
• Different Estimation mechanisms for Average, Count and Sum
ICICLES: Self-tuning Samples for Approximate Query
13
Estimators for Aggregate queries
• Average is the average of distinct
tuples in sample satisfying query• Count is the sum of expected
contributions of all tuples in icicle that satisfy the query
• Sum is the product of average and count
ICICLES: Self-tuning Samples for Approximate Query
14
Maintaining Frequency Relation
• Add Frequency Attribute to the Relation R• Frequency of each tuples is set to 1• Frequency incremented each time when a
tuple is used to answer a query• Frequencies of relevant tuples updated
only when icicle updated with new query
ICICLES: Self-tuning Samples for Approximate Query
15
Quality Guarantees
• When queries in workload exhibit data locality, then icicles consists of more tuples from frequently accessed subsets of the relation
• Accuracy improves with increase in number of tuples used to compute it
ICICLES: Self-tuning Samples for Approximate Query
16
Performance Evaluation
Plots definition:• Static sample:
Uniform random sample on the relation• Icicle:
Icicle evolves with the workload• Icicle-complete
The tuned icicle again on the same workload
ICICLES: Self-tuning Samples for Approximate Query
17
Performance EvaluationSELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice)
FROM LI, C, O, S, N, R
WHERE C_Custkey=O_Custkey AND O_Orderkey=LI_Orderkey AND LI_Suppkey=S_Suppkey AND
C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND
R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998
SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice)
FROM LICOS-icicle, N, R
WHERE C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND
R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998
Qworkload : Template for generating workloads
Template for obtaining approximate answers
ICICLES: Self-tuning Samples for Approximate Query
18
Performance Evaluation
ICICLES: Self-tuning Samples for Approximate Query
19
Performance Evaluation
ICICLES: Self-tuning Samples for Approximate Query
20
Conclusion
• Icicles are class of samples that are sensitive to workload characteristics
• Adapt quickly to changing workload• Icicles are useful when the workload
focuses on relatively small subsets in relation
• Icicle is a trade-off between accuracy and cost
ICICLES: Self-tuning Samples for Approximate Query
21
References
• V. Ganti, M. Lee, and R. Ramakrishnan. ICICLES: Self-tuning Samples for Approximate Query Answering. VLDB Conference 2000.
ICICLES: Self-tuning Samples for Approximate Query
22
Thank you!