27
Transbase® Hypercube: A leading-edge ROLAP Engine supporting multidimensional Indexing and Hierarchy Clustering Roland Pieringer Transaction Software GmbH Thomas-Dehler-Str. 18 81737 München, Germany www.transaction.de

Transbase® Hypercube: A leading-edge ROLAP Engine supporting multidimensional Indexing and Hierarchy Clustering Roland Pieringer Transaction Software GmbH

  • View
    218

  • Download
    1

Embed Size (px)

Citation preview

Transbase® Hypercube: A leading-edge ROLAP

Engine supporting multidimensional Indexing and Hierarchy Clustering

Transbase® Hypercube: A leading-edge ROLAP

Engine supporting multidimensional Indexing and Hierarchy Clustering

Roland Pieringer

Transaction Software GmbH

Thomas-Dehler-Str. 18

81737 München, Germany

www.transaction.de

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 2 -

BTW 2003

Motivation

Many applications have multidimensional data Multidimensional indexes support retrieval of MD data Application Field: Data Warehouses

Hierarchically organized dimensions (e.g., year – month – day) Large data volumes Relatively static Mainly retrieval query profile

MD indexes usually support numeric MD data Encoding for hierarchical data necessary

Multidimensional Hierarchical Clustering (MHC)

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 3 -

BTW 2003

Theoretical comparison of range query performance

idealcase

multidimensionalindex

multipleB-Trees,

bitmap indexes

compound primaryB-Tree

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 4 -

BTW 2003

UB-Tree: basic concepts

Combination of B+-Tree and Z-curve Z-curve is used to map multidimensional points to one-dimensional

values (Z-values) Z-values are used as keys in B*-Tree Z-curve preserves spatial-proximity symmetric clustering

Index part

Data part

8 178 17 39 5139 51

2828

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 5 -

BTW 2003

Visualized range-queries

Germany

Sachsen

Bayern

Freiberg

Leipzig

Dresden

Burgh

München

Passau

Feb 2003 Mar 2003 Jun 2003

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 6 -

BTW 2003

MHC: Non-clustered hierarchy

Item

ProductGroup

Category

Sector

VideoAudio

Camcorder VCR

TR-780 TRV-30 GR-AX 200 GV-500 SLV-E800

Brown Goods White Goods

ALL

...

...

ID 2 11 5 8 21

Item

ProductGroup

Category

Sector

VideoAudio VideoAudio

Camcorder VCR

TR-780 TRV-30TR-780 TRV-30 GR-AX 200 GV-500 SLV-E800

Brown Goods White GoodsBrown Goods White Goods

ALL

...

...

ID 2 11 5 8 21

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 7 -

BTW 2003

MHC: Clustered hierarchy

...

Item

ProductGroup

Category

Sector

VideoAudio

Camcorder VCR

TR-780 TRV-30 GR-AX 200 GV-500 SLV-E800

Brown Goods White Goods

ALL

...

...

ID 2 11 5 8 21

0 1

00 01

0 1

Surrogate 00100000

0000 0001 0000 0001 0010

00100001 00110000 00110001 0011001032 33 48 49 50

...

Item

ProductGroup

Category

Sector

VideoAudio VideoAudio

Camcorder VCR

TR-780 TRV-30TR-780 TRV-30 GR-AX 200 GV-500 SLV-E800

Brown Goods White GoodsBrown Goods White Goods

ALL

...

...

ID 2 11 5 8 21

0 1

00 01

0 1

Surrogate 00100000

0000 0001 0000 0001 0010

00100001 00110000 00110001 0011001032 33 48 49 50

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 8 -

BTW 2003

Basic technology of MHC

MHC: Multidimensional Hierarchical Clustering MHC necessary because

Hierarchical organization of dimensions in warehouses No intervals for hierarchical restrictions Naive restrictions lead to many point queries instead of one

interval on UB-Tree

Artificial encoding of hierarchies: Mapping of hierarchy restrictions to range restrictions Mapping is used for physical clustering of the fact table Modification of query algorithms necessary Fast computation and space efficient

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 9 -

BTW 2003

Implementation of MHC

Implementation into Transbase® DBMS kernel Computation and maintenance of MHC encoding Integration into DDL and DML Integration into optimizer Integration into archiving tools

Transparency to users Physical optimization No extension of the DML

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 10 -

BTW 2003

Supported schemata

Support of star schema and snowflake schema Star schemata

Conventional complete de-normaliation of the dimension tables Foreign key relationships between fact table and dimension

tables

Supported snowflake schemata Inner dimension tables de-normalized with hierarchy attributes Feature attributes can be normalized Fully supported by optimizer More efficient than star schemata (knowledge about hierarchical

dependency)

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 11 -

BTW 2003

Transbase® DDL extension

Dimension TableCREATE TABLE dim_segment (

country_id INTEGER NOT NULL,country_txt CHAR(*),region_id INTEGER NOT NULL,region_txt CHAR(*),micromarket_id INTEGER(*) NOT NULL,micromarket_txt CHAR(*),outlet_id INTEGER NOT NULLoutlet_txt CHAR(*),SURROGATE cs_segment COMPOUND (country_id

SIBLINGS 16, region_id SIBLINGS 19, micromarket_id SIBLINGS 6, outlet_id SIBLINGS 2202),

PRIMARY KEY (outlet_id))

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 12 -

BTW 2003

Transbase® DDL extension (cont.)

Fact Table: CREATE TABLE fact (

dseg INTEGER REFERENCES dim_segment(outlet_id) ON UPDATE CASCADE,dprod INTEGER REFERENCES dim_product(item_id) ON UPDATE CASCADE,dtime INTEGER REFERENCES dim_time(day_id) ON UPDATE CASCADE,turnover NUMERIC(10,2)

…SURROGATE cs_seg FOR dseg,SURROGATE cs_prod FOR dprod,SURROGATE cs_time FOR dtime,PRIMARY HCKEY (cs_seg, cs_prod, cs_time)

)

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 13 -

BTW 2003

DML

No change of DML statements (SELECT, INSERT, UPDATE, DELETE) Conventional star (snowflake) joins (SQL-92 compliant):

SELECT country, department, category, group, year, quarter, month, SUM(price), SUM(turnover)

FROM customer c, product p, date d, fact fWHERE

f.custkey = c.customer AND f.prodkey = p.item_key AND f.datekey = d.day AND c.country = 'GERMANY' ANDc.department = 'SOUTH' ANDp.category = 'TV' ANDd.month = '10/2002' AND d.year = '2002'

GROUP BY country, department, category, group, year, month

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 14 -

BTW 2003

Conventional query processing

Standard method (non-clustering indexes): Index evaluation of dimension restrictions Fact table tuple materialization Residual join with dimension tables Grouping and aggregating Sorting

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 15 -

BTW 2003

MHC query processing: Overview

Abstract execution plan: better understanding, implementation in operator trees

Three phases: Interval generation (semi – join) Fact table access Grouping and residual join

Optimizing: hierarchical pre-grouping Minimize residual join operations by grouping before joining

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 16 -

BTW 2003

AEP - overview

Fact

Fact Table Access

Group Select

Order By

Create Range Create Range

DiDj

Main Execution Phase

Interval Generation

.

.

.

Residual Join

Dk

Di

...

Having

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 17 -

BTW 2003

Interval generation

Mapping of hierarchical restrictions into a number of intervals

Usage of special hierarchy indexes: DXh Index: (ht, ht-1, ..., h1, cs) Efficient interval computation

Optimization for feature restrictions: Merging many small intervals to less large intervals Usage of hierarchical dependency for feature attributes, if

supported by the schema (snowflake schemata)

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 18 -

BTW 2003

Fact table access

Combination of intervals of all clustering dimensions forms multidimensional query boxes QBi

Fact table access with implicit tuple materialization Sequential processing of query boxes Fast retrieving of result tuples Postfiltering can be necessary depending on the UB-Tree

dimensions and restrictions

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 19 -

BTW 2003

Standard AEP

Fact Table Access

Residual Join

Group Select

Order By

Dk

Predicate Evaluation

Having

Fact

Di

...

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 20 -

BTW 2003

Optimization: Hierarchical pre-grouping

Basic concept Hierarchy encoding stored in fact table (compound surrogates) Groups of hierarchical GROUP BY attributes built from

compound surrogates Grouping not exact for non-prefix path grouping Drastic reduction of fact table result tuples Example (for hierarchy year – month – day):

number of fact table result tuples: 100.000pre-grouping (on month): ca. 3.000 (aggregated) tuples residual join with 3.000 instead of 100.000 tuples reduction by a factor of 30!

Possibly post-grouping necessary for too fine pre-grouping

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 21 -

BTW 2003

Hierarchical pre-grouping (cont.)

Dln

Fact Table Access

Post-Group

Order By

Pre-Group

Residual Join

Having

Predicate EvaluationFact

Residual Join

Dei

De1

Dl1

...

...

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 22 -

BTW 2003

Performance comparison

Data: Real world data warehouse of electronic retailer in Greece 5 dimensions, 49 measures on fact table 3 years of transactions, i.e., 8,5 million fact table tuples (2,8 GB)

Environment 2 Processor Pentium II (400 MHz), 768 MB RAM, Windows 2000

Queries 22 query classes with 1.320 real world user queries

Comparisons MHC versus no multidimensional clustering Conventional grouping versus hierarchical pre-grouping

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 23 -

BTW 2003

Perf. comp: MHC – no clustering

FT Sel. % [0.0-0.1] [0.1-1.0] [1.0-5.0]

STAR AEP STAR AEP STAR AEP

MIN 0 0 65 2 274 11

MAX 30 6 290 9 1219 47

MEDIAN 1 1 182 8 477 23

STD-DEV 5 1 76 3 346 14

Time of fact tuple access in seconds

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 24 -

BTW 2003

Perf. comp: no pre-grouping – pre-grouping

FT Sel. % All [0.0 - 0.25] [0.25 - 1.0] [1.0 - 10.0]

MIN 3,6 3,6 21,3 46,0

1. Quartile 245,8 135,1 911,3 816,2

MEDIAN 1.139,5 531,6 2.270,4 5.938,9

3. Quartile 4.708,0 1.905,6 9.747,5 25.409,6

MAX 593.280,0 19.340,0 78.384,0 593.280,0

Comparison of grouping Cardinality:No pre-grouping / Hier. pre-grouping

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 25 -

BTW 2003

Perf. comp: no pre-grouping – pre-groupingSpeedup of the time of hierarchical pre-grouping

FT Sel. % ALL [0.0 - 0.25] [0.25 - 1.0] [1.0 - 10.0]

MIN 0,3 0,3 0,8 0,6

1. Quartile 3,0 2,4 3,9 4,6

MEDIAN 4,4 3,6 5,8 6,6

3. Quartile 6,5 5,2 7,2 7,8

MAX 25,5 14,3 25,5 12,6

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 26 -

BTW 2003

Summary

MHC: Multidimensional hierarchical clustering Encoding for hierarchy paths, in order to support clustering

multidimensional indexes Support of star and snowflake schemata

Full implementation into Transbase® Integration into the query processor (maintenance of compound

surrogates) Integration into the optimizer (interval generation, fact table

access, hierarchical pre-grouping)

Significant speedup of performance: Clustering vs. non-clustering organization: 2-20 Conventional grouping vs. hierarchical pre-grouping: 4-7

Transbase® Hypercube©2003 Transaction Software GmbHwww.transaction.de

Feb 2003- 27 -

BTW 2003

Questions ????

Everything clear?

Otherwise contact:

Roland Pieringer

Tel: 089/62709-0

Transaction Software GmbH

[email protected]

www.transaction.de