Upload
eric-sherman-atkinson
View
215
Download
1
Embed Size (px)
Citation preview
CURE for Cubes:Cubing Using a ROLAP Engine
Konstantinos Morfonios
Yannis Ioannidis
University of Athens
VLDB 2006
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
Introduction
SELECT region, sum(revenue)FROM SALESWHERE month = ‘September’GROUP BY region
Gray On Data-warehousing:
CUBE
CUBE
IntroductionSELECT A, B, C, SUM(M)FROM RGROUP BY A, B, C
SELECT A, B, SUM(M)FROM RGROUP BY A, B
SELECT SUM(M)FROM R
Introduction
ProblemsConstruction algorithmStorage scheme
Focusing on ROLAP techniques (MVs)Stressed to limits?
Complete solution?
Unclear (not finishedwith efficient storage)
Unclear (not focusedon hierarchies)
Introduction
D
1i
Di 2)1L( Number of nodes: often
Efficient execution plan
Small domains in the higher levels of dimension hierarchies
New partitioning algorithm
Challenges of hierarchies:
Number of tuples increasesNovel storage scheme
CURE
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
Execution Plan
Extend BUC (Bottom-Up-Cube) [BR99]Efficient pipeliningCheap identification of some kinds of
redundancy Inherent support for iceberg cubes and
holistic functions Existing “BUC-based” methods: BU-BST
[WLFY02] and QC-Tables [LPH02]
Execution Plan
Dimensions: A0, A1, A2, B0, B1, C0
A0B0 A0B1 A0C0 B0C0 B1C0
A0 B0 B1 C0
A0B0C0 A0B1C0
A1B0 A1B1 A1C0
A1
A1B0C0 A1B1C0
A2B0 A2B1 A2C0
A2
A2B0C0 A2B1C0
Execution Plan
Dimensions: A0, A1, A2, B0, B1, C0
A0B0 A0B1 A0C0 B0C0 B1C0
A0 B0 B1 C0
A0B0C0 A0B1C0
A1B0 A1B1 A1C0
A1
A1B0C0 A1B1C0
A2B0 A2B1 A2C0
A2
A2B0C0 A2B1C0
Execution Plan
Height: 3
Dimensions: A0, A1, A2, B0, B1, C0
A0B0 A0B1 A0C0 B0C0 B1C0
A0 B0 B1 C0
A0B0C0 A0B1C0
A1B0 A1B1 A1C0
A1
A1B0C0 A1B1C0
A2B0 A2B1 A2C0
A2
A2B0C0 A2B1C0
Execution Plan
Dimensions: A0→A1→A2, B0→B1, C0
A2B1
B1
A2 C0
A1 A2C0 B0 B1C0
A0 A1B1 A1C0 A2B0 B0C0A2B1C0
A0B1 A0C0 A1B0 A1B1C0 A2B0C0
A0B0 A0B1C0 A1B0C0
A0B0C0
Execution Plan
Dimensions: A0→A1→A2, B0→B1, C0
A2B1
B1
A2 C0
A1 A2C0 B0 B1C0
A0 A1B1 A1C0 A2B0 B0C0A2B1C0
A0B1 A0C0 A1B0 A1B1C0 A2B0C0
A0B0 A0B1C0 A1B0C0
A0B0C0
Execution Plan
Height: 6
Dimensions: A0→A1→A2, B0→B1, C0
A2B1
B1
A2 C0
A1 A2C0 B0 B1C0
A0 A1B1 A1C0 A2B0 B0C0A2B1C0
A0B1 A0C0 A1B0 A1B1C0 A2B0C0
A0B0 A0B1C0 A1B0C0
A0B0C0
Execution Plan
Important properties of BUC-based cubing:Recursive calls at higher levels tend to be
cheaperBenefits from early pruning recursion at some
node N increase with the number of ancestors of N in the execution plan
Advantage of taller execution plansABC
AC BCAB
B CA
ABC
ACAB
A
Execution Plan
A2B1
B1
A2 C0
A1 A2C0 B0 B1C0
A0 A1B1 A1C0 A2B0 B0C0A2B1C0
A0B1 A0C0 A1B0 A1B1C0 A2B0C0
A0B0 A0B1C0 A1B0C0
A0B0C0
CURE’s Plan:
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
R Memory
A2B1
B1
A2 C0
A1 A2C0 B0 B1C0
A0 A1B1 A1C0 A2B0 B0C0A2B1C0
A0B1 A0C0 A1B0 A1B1C0 A2B0C0
A0B0 A0B1C0 A1B0C0
A0B0C0
External Partitioning
A2B1
B1
A2 C0
A1 A2C0 B0 B1C0
A0 A1B1 A1C0 A2B0 B0C0A2B1C0
A0B1 A0C0 A1B0 A1B1C0 A2B0C0
A0B0 A0B1C0 A1B0C0
A0B0C0
External Partitioning
For sound partitioning |Biggest partition| ≤ |M| In flat datasets this holds in general In hierarchical datasets…
External Partitioning|R| = 500 GB, |M| = 1 GBA0 (50,000)→A1 (500)→A2 (5)
|R|/|M| = 500
A2B1
B1
A2 C0
A1 A2C0 B0 B1C0
A0 A1B1 A1C0 A2B0 B0C0A2B1C0
A0B1 A0C0 A1B0 A1B1C0 A2B0C0
A0B0 A0B1C0 A1B0C0
A0B0C0
External Partitioning
A2B1
B1
A2 C0
A1 A2C0 B0 B1C0
A0 A1B1 A1C0 A2B0 B0C0A2B1C0
A0B1 A0C0 A1B0 A1B1C0 A2B0C0
A0B0 A0B1C0 A1B0C0
A0B0C0
|R| = 500 GB, |M| = 1 GBA0 (50,000)→A1 (500)→A2 (5)
|R|/|M| = 500
External Partitioning
A2B1
B1
A2 C0
A1 A2C0 B0 B1C0
A0 A1B1 A1C0 A2B0 B0C0A2B1C0
A0B1 A0C0 A1B0 A1B1C0 A2B0C0
A0B0 A0B1C0 A1B0C0
A0B0C0
|R| = 500 GB, |M| = 1 GBA0 (50,000)→A1 (500)→A2 (5)
|R|/|M| = 500
External Partitioning
A2B1
B1
A2 C0
A1 A2C0 B0 B1C0
A0 A1B1 A1C0 A2B0 B0C0A2B1C0
A0B1 A0C0 A1B0 A1B1C0 A2B0C0
A0B0 A0B1C0 A1B0C0
A0B0C0
|R| = 500 GB, |M| = 1 GBA0 (50,000)→A1 (500)→A2 (5)
|R|/|M| = 500
External Partitioning
A2B1
B1
A2 C0
A1 A2C0 B0 B1C0
A0 A1B1 A1C0 A2B0 B0C0A2B1C0
A0B1 A0C0 A1B0 A1B1C0 A2B0C0
A0B0 A0B1C0 A1B0C0
A0B0C0
|R| = 500 GB, |M| = 1 GBA0 (50,000)→A1 (500)→A2 (5)
|R|/|M| = 500
External Partitioning
A2B1
B1
A2 C0
A1 A2C0 B0 B1C0
A0 A1B1 A1C0 A2B0 B0C0A2B1C0
A0B1 A0C0 A1B0 A1B1C0 A2B0C0
A0B0 A0B1C0 A1B0C0
A0B0C0
|R| = 500 GB, |M| = 1 GBA0 (50,000)→A1 (500)→A2 (5)
|R|/|M| = 500
External Partitioning
A2B1
B1
A2 C0
A2C0 B0 B1C0
A2B0 B0C0A2B1C0
A2B0C0
A1
A0 A1B1 A1C0
A0B1 A0C0 A1B0 A1B1C0
A0B0 A0B1C0 A1B0C0
A0B0C0
|R| = 500 GB, |M| = 1 GBA0 (50,000)→A1 (500)→A2 (5)
|R|/|M| = 500
External Partitioning
A2B1
B1
A2 C0
A2C0 B0 B1C0
A2B0 B0C0A2B1C0
A2B0C0
A1
A0 A1B1 A1C0
A0B1 A0C0 A1B0 A1B1C0
A0B0 A0B1C0 A1B0C0
A0B0C0
|R| = 500 GB, |M| = 1 GBA0 (50,000)→A1 (500)→A2 (5)
|R|/|M| = 500
External Partitioning
|A0|/|A2| times smaller than R|A2B0C0| ≈ 50 MB
A2B1
B1
A2 C0
A2C0 B0 B1C0
A2B0 B0C0A2B1C0
A2B0C0
A1
A0 A1B1 A1C0
A0B1 A0C0 A1B0 A1B1C0
A0B0 A0B1C0 A1B0C0
A0B0C0
|R| = 500 GB, |M| = 1 GBA0 (50,000)→A1 (500)→A2 (5)
|R|/|M| = 500
External Partitioning
A2B1
B1
A2 C0
A2C0 B0 B1C0
A2B0 B0C0A2B1C0
A2B0C0
A1
A0 A1B1 A1C0
A0B1 A0C0 A1B0 A1B1C0
A0B0 A0B1C0 A1B0C0
A0B0C0
|R| = 500 GB, |M| = 1 GBA0 (50,000)→A1 (500)→A2 (5)
|R|/|M| = 500
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
Storage Format
ABC
AC BCAB
B CA
Example with flat cube only for simplicity
A2B1
B1
A2 C0
A1 A2C0 B0 B1C0
A0 A1B1 A1C0 A2B0 B0C0A2B1C0
A0B1 A0C0 A1B0 A1B1C0 A2B0C0
A0B0 A0B1C0 A1B0C0
A0B0C0
Storage Format
CUBE with DR CUBE’ without DR
Classify tuples according to AR into:
• Normal Tuples (NTs)
• Trivial Tuples (TTs)
• Common Aggregate Tuples (CATs)
Storage Format
Purpose of the previous example:Explanation of different types of redundancyNot construction algorithm
Constructing an uncompressed cube and then compressing it would be inefficient
Instead, CURE classifies tuples during construction itself (details in the paper)
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
Experimental Evaluation
Hierarchical datasets: APB-1Product: Code (6,500) → Class (435) →
Group (215) → Family (54) → Line (11) → Division (3)
Customer: Store (640) → Retailer (71)Time: Month (17) → Quarter (6) → Year (2)Channel: Base (9)
Flat datasets: CovType, Sep85L, Synthetic
Experimental Evaluation
0
50
100
150
200
250
300
1.E+06 1.E+07 1.E+08 1.E+09Number of Tuples in the Fact Table
Tim
e (m
in)
CURE
CURE+
Less than 3 hours
Experimental Evaluation
0
2
4
6
8
10
1.E+06 1.E+07 1.E+08 1.E+09
Number of Tuples in the Fact Table
Sto
rag
e S
pac
e (G
B)
CURE
CURE+ ≈ 6.8 GB
Experimental Evaluation
0
50
100
150
200
250
300
BUC BU-BST FCURE FCURE+ CURE CURE+
APB 0.4
Tim
e (s
ec)
Experimental Evaluation
0
50
100
150
200
250
300
350
400
450
BUC BU-BST FCURE FCURE+ CURE CURE+
APB 0.4
Sto
rag
e S
pac
e (M
B)
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
Introduction
Execution Plan
External Partitioning
Storage Format
Experimental Evaluation
Conclusions
Conclusions Main contribution: CURE
Efficient execution planNew partitioning algorithmNovel storage scheme
Main advantages of CUREEfficient construction of complete cubes over
large datasets with arbitrary hierarchiesCube compressionOptimization opportunities for queries and
updatesEasy implementation
Current and Future Work
Study of indexing for queries and updates Comparison with the most prominent
MOLAP and Tree-based techniques