24
The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris Tsois Timos Sellis Knowledge and Database Systems Laboratory National Technical University of Athens, Hellas 29 th International Conference on Very Large Databases September 9-12, 2003 Updated version: DBLAB presentation 14/10/2003

The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

  • View
    220

  • Download
    1

Embed Size (px)

Citation preview

Page 1: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of

Dependencies

Aris Tsois Timos SellisKnowledge and Database Systems Laboratory

National Technical University of Athens, Hellas

29th International Conference on Very Large DatabasesSeptember 9-12, 2003

Updated version: DBLAB presentation 14/10/2003

Page 2: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Motivation DW & OLAP require fast answers to heavy

aggregate queries Recent approach: Multidimensional Hierarchical

Clustering & Hierarchical Indexing (MHC/HI) [IDEAS’99]

Indexed access to MD clustered data – UB-tree Results in reduced number of I/O operations

MHC/HI is extremely effective but we can do even more in order to achieve efficiency in query processing

One such technique is Hierarchical Pre-Grouping

Page 3: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

The Hierarchical Pre-Grouping Karayannidis et al. [VLDB’02], Pieringer et al.

[ICDE’03] Goal: reduce the cost of join operations in star-

join queries on MHC/HI data bases The join of large parts of the fact table with large

dimension tables is very expensive Exploits the existence of hierarchical

surrogates in the fact table Pushes-down the aggregation operations as

early as possible Even if this introduces a new aggregation operation

Delays (pulls-up) the joins Expect to have reduced input size

Removes redundant joins

Page 4: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Example schema

sales

SALES_FACT

store_key

date_key

product_key

PRODUCT

category

brand

product

DATE

month

year

day

STORE

area

region

storehsk

prod_hsk

store_hsk

date_hsk

1

2

1

2

3

1

2

3

hsk

hsk

Page 5: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

The hierarchical surrogates

sales

SALES_FACT

store_key

date_key

product_key

PRODUCT

category

brand

product

DATE

month

year

day

STORE

area

region

storehsk

prod_hsk

store_hsk

date_hsk

1

2

1

2

3

1

2

3

hsk

hsk

07

13

0713

95

27

03

952703

952703 952703

Page 6: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Example query

sales

SALES_FACT

store_key

date_key

product_key

PRODUCT

category

brand

product

DATE

month

year

day

STORE

area

region

storehsk

prod_hsk

store_hsk

date_hsk

hsk

hsk

SUM(sales)

areaarea

brandbrand

month

SUM(sales)

Grouping attributesSelected attributes

Join

SELECT SUM(sales), brand, areaFROM SALES_FACT, STORE, PRODUCT, DATE WHERE <join conditions>GROUP BY brand, area, month

Page 7: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Simple execution plan

Join

Join

Join

DATE PRODUCT STORESALES_FACT

Group By & Aggregatebrand area

brandGroup By: monthbrand areamonth

s=SUM(sales)

date_hsk:month

Can we avoid

some of the joins?

Page 8: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Optimized plan (#1)

Join

Join

PRODUCT STORESALES_FACT

Group By & Aggregatebrand area

brandGroup By: monthbrand areadate_hsk:month

s=SUM(sales)

store_hsk:area

Can we delay a join

for after the

aggregation?

Page 9: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Optimized plan (#2)

Join

Join

PRODUCT STORESALES_FACT

Group By & Aggregatebrand store_hsk:area

brandGroup By: monthbrand areadate_hsk:month

s=SUM(sales)

store_hsk:area

areabrands

hsk:areaGroup By & Aggregate

hsk:area area

area

Can we make the join work on a smaller intermediate result?

Page 10: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Optimized plan (#3)

Join

Join

PRODUCT STORESALES_FACT

Group By & Aggregatebrand store_hsk:area

brandGroup By: monthbrand store_hsk:areadate_hsk:month

s=SUM(x)

areabrands

hsk:areaGroup By & Aggregate

hsk:area area

areaGroup By & Aggregate

prod_hsk store_hsk:area

brandprod_hsk store_hsk:areadate_hsk:month

x=SUM(sales)

Page 11: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Hierarchical Pre-Grouping Classification of the effects:

Remove a join with a dimension table (DATE) Postpone a join for after the grouping operation

(STORE) Introduce an additional grouping operation before all

joins thus creating a two-stage grouping process (PRODUCT)

Experimental results show an important impact:

Reduces response time by more than 50% - 75% (Karayannidis et al. [VLDB’02], Pieringer et al. [ICDE’03])

Page 12: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Motivating questions Can Pre-Grouping be applied to other database

schemata without h-surrogates?

What are the precise conditions required to apply the transformations done by Pre-Grouping?

Is Pre-Grouping a combination of known optimization techniques or does it introduce some novelty?

Page 13: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Main results Define the Generalized Pre-Grouping as an

algebraic transformation: E1=E2 Using Select (), Cross-product () and Generalized

Projection (Л) operators Decompose Pre-Grouping into a sequence of more

simple transformations Analyze the relationship between Pre-Groupings and

other known transformations Clarify which transformations use semantic information

(I.C.) Establish the importance of the Surrogate-Join

transformation Identify sufficient conditions for applying the

various transformations Use relations with bag semantics and NULL values

Page 14: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

SuSu

SuGroup By & Aggregate

A=F(Ag) Su

Sd3

Generalized Pre-Grouping (1)

Join

…Hu1 SHu1 Hu2 SHu2 Hu3 SHu3

Sd3 SKd3 …Kd3Rd3

Ru

Case #1: Remove redundant join

Hu3

Kd3

Sd3SHu3

Sd3

I1: Sd3SKd3

I2: {Kd3,SKd3}{Hu3,SHu3}

I1

I3: Kd3 key of Rd3

Kd3

I3

I2Hu3 SHu3SHu3

Su

Page 15: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

SuSu

SuGroup By & Aggregate

A=F(Ag) Su

Sd2

Generalized Pre-Grouping (2)

Join

…Hu1 SHu1 Hu2 SHu2 Hu3 SHu3

Sd2 SKd2 …Kd2Rd2

Ru

Case #2: Delay a join

Hu2

Kd2

Sd2SHu2

Sd2

I6: Sd2SKd2

I4: {Kd2,SKd2}{Hu2,SHu2}

I6

I5: Kd2 key of Rd2

Kd2

I5

SHu2I4

Hu2 SHu2

Su

Sd2SHu2

Join

SuA

Group By & AggregateSd2 SKd2

Sd3SKd2

Sd2

Page 16: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Generalized Pre-Grouping (3)

Join

Group By & AggregateSu Sd1

Sd1Su

…Hu1 SHu1 Hu2 SHu2 Hu3 SHu3 Su Sd1 SKd1 …Kd1Ru Rd1

Case #3: Split aggregation into two stages

SKd1Group By & Aggregate

Sd1 SKd1

A=F(Ag)A=F(AgO(x))

Group By & Aggregatex=F(Ag) Su SHu1

SHu1Su

I8: {Kd1,SKd1}{Hu1,SHu1}

I7: Ag(z)=Ago(Ag(z))

Kd1

I9: Kd1 key of Rd1

SKd1Sd1

Page 17: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Generalized Pre-Grouping The combination of all three cases define the

Generalized Pre-Grouping The decomposition proves:

A set of sufficient conditions for applying the Generalized Pre-Grouping transformation

The relationship to other known transformations The usage of semantic information

The Generalized Pre-Grouping uses Surrogate-Join to modify the join conditions

Page 18: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Surrogate-Join transformation

Join

Join

A B

Group By & AggregateSK

B SKSH A

SK B PKH SH A O

A B

H SH A O

, 1 2

, , 1 , 2

( )

( ( ) ( ))

A B H K

SKA B SH SK A SH B SK

R × R

R R

Л

Л Л Л

SK B PK

R1 R2

R1

R2

K

Page 19: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Surrogate-Join example

Join

PageID PageHits ServerID ServerHits

Page_Server_Hits

PageID ServerID Hour HourPageHits

PageID HourPageHits/ServerHitsHour

Page_Hour_Hits

Group By & AggregateServerID

ServerID ServerHits

SELECT h.PageID, h.Hour, h.HourPageHits/s.ServerHitsFROM (SELECT DISTINCT ServerID, ServerHits

FROM Page_Server_Hits) s, Page_Hour_Hits hWHERE s.ServerID=h.ServerID

SELECT h.PageID, h.Hour, h.HourPageHits/s.ServerHitsFROM Page_Server_Hits s, Page_Hour_Hits hWHERE s.PageID=h.PageID

Page 20: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Bad news: Surrogate-Join can be described as a conjunctive query transformation

H SH A OR1

Join

A B

SK BP KR2

SK BP KR21SK

Join

R22

Page 21: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Bad news: Surrogate-Join can be described as a conjunctive query transformation

H SH A OR1

Join

A B

SK BP KR21SK

Join

R22

H=K & SH=SK

R1 R21

R22

H=K & SH=SK

SKSH=SK

SKK SK

Page 22: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Bad news: Surrogate-Join can be described as a conjunctive query transformation

H SH A OR1

Join

A B

SK BP K SK R22

SH=SK

SKK SKR21

Page 23: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Conclusions The Pre-Grouping transformation is a mixture

of known and new transformations The Generalized Pre-Grouping can be applied

in the absence of h-surrogates using only SQL integrity constraints

The Surrogate-Join transformation is an important ingredient of Pre-Grouping. It exploits functional and inclusion dependencies

Semantic Query Optimization techniques are particularly effective in the DW & OLAP areas

Page 24: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Contact

S P Q G1 J C HG3G2

Aris TsoisKnowledge and Database Systems LaboratoryNational Technical University of Athens, Hellas

e-mail: [email protected]: http://www.dblab.ece.ntua.gr/~atsois/

Long version (TR-2003-4) available at:http://www.dblab.ece.ntua.gr/publications/TR-2003-4.pdf