34
Approximate Query Processing using Wavelets Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST and AITrc) Presented at 26 th VLDB Conference, Cairo, Egypt Presented By Supriya Sudheendra

Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST and AITrc) Presented at 26 th VLDB Conference,

Embed Size (px)

Citation preview

  • Slide 1

Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST and AITrc) Presented at 26 th VLDB Conference, Cairo, Egypt Presented By Supriya Sudheendra Slide 2 Outline Slide 3 Introduction o Approximate Query Processing is a viable solution for: Huge amounts of data High query complexities Stringent response-time requirements o Decision Support Systems Support business and organizational decision-making activities Helps decision makers compile useful information from raw data, solve problems and make decisions Slide 4 Introduction o DSS users pose very complex queries to the DBMS Requires complex operations over GB or TBs of disk- resident data Very long time to execute and produce exact answers Number of scenarios where users prefer a fast, approximate answers Slide 5 Prior Work o Previous Approximate query processing techniques Focused on specific forms of aggregate queries Data reduction mechanism how to obtain the synopses of data o Sampling-based Techniques A join-operator on 2 uniform random samples results in a non-uniform sample having very few tuples For non-aggregate queries, it produces a small subset of the exact answer which might be empty when joins are involved. Slide 6 Prior Work o Histogram Based Techniques Problematic for high-dimensional data Storage overhead High construction cost o Wavelet Based Techniques Mathematical tool for hierarchical decomposition of functions Apply wavelet decomposition to input data collection > data synopsis Avoids high construction costs and storage overhead Slide 7 Contribution of the Paper o Viability and effectiveness of wavelets as a generic tool for high-dimensional DSS o New, I/O-efficient wavelet decomposition algorithm for relational tables o Novel Query processing algebra for Wavelet-Co- Efficient Data Synopses o Extensive Experiments Slide 8 Background o Mathematical tool to hierarchically decompose functions o Coarse overall approximation together with detail coefficients that influence function at various scales o Haar wavelets are conceptually simple, fast to compute o Variety of applications like image editing and querying Slide 9 One-Dimensional Haar Wavelets o How to compute, given a data array: Average the values together pairwise to get a lower- resolution representation of data Detailed coefficients-> differences of the averages from the computed pairwise average Reconstruction of the data array possible Why Detail Coefficients Slide 10 One-dimensional Haar Wavelets o Wavelet Transform: Overall average followed by detail coefficients in increasing order of resolution. Each entry->wavelet coefficient o W A = [4, -2, 0, -1] o For vectors containing similar values, most detail coefficients have small values that can be eliminated Introduces only small errors Slide 11 One-dimensional Haar Wavelets o Overall average more important than any detail coefficient o To normalize the final entries of W A, each wavelet coefficient is divided by 2 l l: level of resolution W A = [4, -2, 0, -1/ 2] Slide 12 Multi-dimensional Haar Wavelets o Haar wavelets can be extended to multi-dimensional array Standard Decomposition Fix an ordering for the data dimensions(1,2,d) Apply complete 1-D wavelet transform for each 1-d row of array cells along dimension k Nonstandard Decomposition Alternates between dimensions during successive steps of pairwise averaging and differencing for each 1-D row of array cells along dimension k Repeated recursively on quadrant containing all averages across all dimensions Slide 13 Non-standard Decomposition Pairwise averaging and differencing for one positioning of 2x2 box with root [2i 1, 2i 2 ] Distribution of the results in the wavelet transform array Process is recursed on lower-left quadrant of W A Slide 14 Example Decomposition of a 4 X 4 Array Slide 15 Multi-dimensional Haar coefficients: Semantics and Representation o D-dimensional Haar basis function corresponding to w is defined by: D-dimensional rectangular support region Quadrant sign information Slide 16 Support Regions for 16 Nonstandard 2-D Haar Basis Function Blank areas regions of A whose reconstruction is independent of the coefficient WA[0,0] overall average WA[3,3] contributes only to upper right quadrant Slide 17 Haar CoEfficients: Semantics and Representation o W = W.R d-dimensional support hyper-rectangle of W encloses all cells in A to which W contributes Hyper-rectangle represented by low and high boundaries across each dimension j, 1