Click here to load reader

Transbase® Hypercube: A leading-edge ROLAP Engine supporting multidimensional Indexing and Hierarchy Clustering Roland Pieringer Transaction Software GmbH

  • View
    213

  • Download
    1

Embed Size (px)

Text of Transbase® Hypercube: A leading-edge ROLAP Engine supporting multidimensional Indexing and...

  • Slide 1
  • Transbase Hypercube: A leading-edge ROLAP Engine supporting multidimensional Indexing and Hierarchy Clustering Roland Pieringer Transaction Software GmbH Thomas-Dehler-Str. 18 81737 Mnchen, Germany www.transaction.de
  • Slide 2
  • Feb 2003 - 2 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de Motivation Many applications have multidimensional data Multidimensional indexes support retrieval of MD data Application Field: Data Warehouses Hierarchically organized dimensions (e.g., year month day) Large data volumes Relatively static Mainly retrieval query profile MD indexes usually support numeric MD data Encoding for hierarchical data necessary Multidimensional Hierarchical Clustering (MHC)
  • Slide 3
  • Feb 2003 - 3 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de Theoretical comparison of range query performance ideal case multidimensional index multiple B-Trees, bitmap indexes compound primary B-Tree
  • Slide 4
  • Feb 2003 - 4 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de UB-Tree: basic concepts Combination of B + -Tree and Z-curve Z-curve is used to map multidimensional points to one-dimensional values (Z-values) Z-values are used as keys in B * -Tree Z-curve preserves spatial-proximity symmetric clustering Index part Data part 8178 39513951 28
  • Slide 5
  • Feb 2003 - 5 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de Visualized range-queries Germany Sachsen Bayern Freiberg Leipzig Dresden Burgh Mnchen Passau Feb 2003Mar 2003Jun 2003
  • Slide 6
  • Feb 2003 - 6 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de MHC: Non-clustered hierarchy
  • Slide 7
  • Feb 2003 - 7 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de MHC: Clustered hierarchy
  • Slide 8
  • Feb 2003 - 8 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de Basic technology of MHC MHC: Multidimensional Hierarchical Clustering MHC necessary because Hierarchical organization of dimensions in warehouses No intervals for hierarchical restrictions Naive restrictions lead to many point queries instead of one interval on UB-Tree Artificial encoding of hierarchies: Mapping of hierarchy restrictions to range restrictions Mapping is used for physical clustering of the fact table Modification of query algorithms necessary Fast computation and space efficient
  • Slide 9
  • Feb 2003 - 9 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de Implementation of MHC Implementation into Transbase DBMS kernel Computation and maintenance of MHC encoding Integration into DDL and DML Integration into optimizer Integration into archiving tools Transparency to users Physical optimization No extension of the DML
  • Slide 10
  • Feb 2003 - 10 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de Supported schemata Support of star schema and snowflake schema Star schemata Conventional complete de-normaliation of the dimension tables Foreign key relationships between fact table and dimension tables Supported snowflake schemata Inner dimension tables de-normalized with hierarchy attributes Feature attributes can be normalized Fully supported by optimizer More efficient than star schemata (knowledge about hierarchical dependency)
  • Slide 11
  • Feb 2003 - 11 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de Transbase DDL extension Dimension Table CREATE TABLE dim_segment ( country_id INTEGER NOT NULL, country_txt CHAR(*), region_id INTEGER NOT NULL, region_txt CHAR(*), micromarket_id INTEGER(*) NOT NULL, micromarket_txt CHAR(*), outlet_id INTEGER NOT NULL outlet_txt CHAR(*), SURROGATE cs_segment COMPOUND (country_id SIBLINGS 16, region_id SIBLINGS 19, micromarket_id SIBLINGS 6, outlet_id SIBLINGS 2202), PRIMARY KEY (outlet_id) )
  • Slide 12
  • Feb 2003 - 12 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de Transbase DDL extension (cont.) Fact Table: CREATE TABLE fact ( dsegINTEGER REFERENCES dim_segment(outlet_id) ON UPDATE CASCADE, dprodINTEGER REFERENCES dim_product(item_id) ON UPDATE CASCADE, dtimeINTEGER REFERENCES dim_time(day_id) ON UPDATE CASCADE, turnover NUMERIC(10,2) SURROGATE cs_seg FOR dseg, SURROGATE cs_prod FOR dprod, SURROGATE cs_time FOR dtime, PRIMARY HCKEY (cs_seg, cs_prod, cs_time) )
  • Slide 13
  • Feb 2003 - 13 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de DML No change of DML statements (SELECT, INSERT, UPDATE, DELETE) Conventional star (snowflake) joins (SQL-92 compliant): SELECT country, department, category, group, year, quarter, month, SUM(price), SUM(turnover) FROM customer c, product p, date d, fact f WHERE f.custkey = c.customer AND f.prodkey = p.item_key AND f.datekey = d.day AND c.country = 'GERMANY' AND c.department = 'SOUTH' AND p.category = 'TV' AND d.month = '10/2002' AND d.year = '2002' GROUP BY country, department, category, group, year, month
  • Slide 14
  • Feb 2003 - 14 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de Conventional query processing Standard method (non-clustering indexes): Index evaluation of dimension restrictions Fact table tuple materialization Residual join with dimension tables Grouping and aggregating Sorting
  • Slide 15
  • Feb 2003 - 15 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de MHC query processing: Overview Abstract execution plan: better understanding, implementation in operator trees Three phases: Interval generation (semi join) Fact table access Grouping and residual join Optimizing: hierarchical pre-grouping Minimize residual join operations by grouping before joining
  • Slide 16
  • Feb 2003 - 16 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de AEP - overview Fact Fact Table Access Group Select Order By Create Range DiDi DjDj Main Execution Phase Interval Generation...... Residual Join DkDk DiDi... Having
  • Slide 17
  • Feb 2003 - 17 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de Interval generation Mapping of hierarchical restrictions into a number of intervals Usage of special hierarchy indexes: DXh Index: (h t, h t-1,..., h 1, cs) Efficient interval computation Optimization for feature restrictions: Merging many small intervals to less large intervals Usage of hierarchical dependency for feature attributes, if supported by the schema (snowflake schemata)
  • Slide 18
  • Feb 2003 - 18 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de Fact table access Combination of intervals of all clustering dimensions forms multidimensional query boxes QB i Fact table access with implicit tuple materialization Sequential processing of query boxes Fast retrieving of result tuples Postfiltering can be necessary depending on the UB-Tree dimensions and restrictions
  • Slide 19
  • Feb 2003 - 19 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de Standard AEP Fact Table Access Residual Join Group Select Order By DkDk Predicate Evaluation Having Fact DiDi...
  • Slide 20
  • Feb 2003 - 20 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de Optimization: Hierarchical pre-grouping Basic concept Hierarchy encoding stored in fact table (compound surrogates) Groups of hierarchical GROUP BY attributes built from compound surrogates Grouping not exact for non-prefix path grouping Drastic reduction of fact table result tuples Example (for hierarchy year month day): number of fact table result tuples: 100.000 pre-grouping (on month): ca. 3.000 (aggregated) tuples residual join with 3.000 instead of 100.000 tuples reduction by a factor of 30! Possibly post-grouping necessary for too fine pre-grouping
  • Slide 21
  • Feb 2003 - 21 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de Hierarchical pre-grouping (cont.) D ln Fact Table Access Post-Group Order By Pre-Group Residual Join Having Predicate Evaluation Fact Residual Join D ei D e1 D l1...
  • Slide 22
  • Feb 2003 - 22 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de Performance comparison Data: Real world data warehouse of electronic retailer in Greece 5 dimensions, 49 measures on fact table 3 years of transactions, i.e., 8,5 million fact table tuples (2,8 GB) Environment 2 Processor Pentium II (400 MHz), 768 MB RAM, Windows 2000 Queries 22 query classes with 1.320 real world user queries Comparisons MHC versus no multidimensional clustering Conventional grouping versus hierarchical pre-grouping
  • Slide 23
  • Feb 2003 - 23 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de Perf. comp: MHC no clustering FT Sel. %[0.0-0.1][0.1-1.0][1.0-5.0] STARAEPSTARAEPSTARAEP MIN0065227411 MAX3062909121947 MEDIAN11182847723 STD-DEV5176334614 Time of fact tuple access in seconds
  • Slide 24
  • Feb 2003 - 24 - BTW 2003 Transbase Hypercube 2003 Transaction Software GmbH www.transaction.de Perf. comp: no pre-grouping pre-grouping FT Sel. %All[0.0 - 0.25][0.25 - 1.0][1.0 - 10.0] MIN

Search related