Transcript
Page 1: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of

Dependencies

Aris Tsois Timos SellisKnowledge and Database Systems Laboratory

National Technical University of Athens, Hellas

29th International Conference on Very Large DatabasesSeptember 9-12, 2003

Updated version: DBLAB presentation 14/10/2003

Page 2: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Motivation DW & OLAP require fast answers to heavy

aggregate queries Recent approach: Multidimensional Hierarchical

Clustering & Hierarchical Indexing (MHC/HI) [IDEAS’99]

Indexed access to MD clustered data – UB-tree Results in reduced number of I/O operations

MHC/HI is extremely effective but we can do even more in order to achieve efficiency in query processing

One such technique is Hierarchical Pre-Grouping

Page 3: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

The Hierarchical Pre-Grouping Karayannidis et al. [VLDB’02], Pieringer et al.

[ICDE’03] Goal: reduce the cost of join operations in star-

join queries on MHC/HI data bases The join of large parts of the fact table with large

dimension tables is very expensive Exploits the existence of hierarchical

surrogates in the fact table Pushes-down the aggregation operations as

early as possible Even if this introduces a new aggregation operation

Delays (pulls-up) the joins Expect to have reduced input size

Removes redundant joins

Page 4: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Example schema

sales

SALES_FACT

store_key

date_key

product_key

PRODUCT

category

brand

product

DATE

month

year

day

STORE

area

region

storehsk

prod_hsk

store_hsk

date_hsk

1

2

1

2

3

1

2

3

hsk

hsk

Page 5: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

The hierarchical surrogates

sales

SALES_FACT

store_key

date_key

product_key

PRODUCT

category

brand

product

DATE

month

year

day

STORE

area

region

storehsk

prod_hsk

store_hsk

date_hsk

1

2

1

2

3

1

2

3

hsk

hsk

07

13

0713

95

27

03

952703

952703 952703

Page 6: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Example query

sales

SALES_FACT

store_key

date_key

product_key

PRODUCT

category

brand

product

DATE

month

year

day

STORE

area

region

storehsk

prod_hsk

store_hsk

date_hsk

hsk

hsk

SUM(sales)

areaarea

brandbrand

month

SUM(sales)

Grouping attributesSelected attributes

Join

SELECT SUM(sales), brand, areaFROM SALES_FACT, STORE, PRODUCT, DATE WHERE <join conditions>GROUP BY brand, area, month

Page 7: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Simple execution plan

Join

Join

Join

DATE PRODUCT STORESALES_FACT

Group By & Aggregatebrand area

brandGroup By: monthbrand areamonth

s=SUM(sales)

date_hsk:month

Can we avoid

some of the joins?

Page 8: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Optimized plan (#1)

Join

Join

PRODUCT STORESALES_FACT

Group By & Aggregatebrand area

brandGroup By: monthbrand areadate_hsk:month

s=SUM(sales)

store_hsk:area

Can we delay a join

for after the

aggregation?

Page 9: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Optimized plan (#2)

Join

Join

PRODUCT STORESALES_FACT

Group By & Aggregatebrand store_hsk:area

brandGroup By: monthbrand areadate_hsk:month

s=SUM(sales)

store_hsk:area

areabrands

hsk:areaGroup By & Aggregate

hsk:area area

area

Can we make the join work on a smaller intermediate result?

Page 10: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Optimized plan (#3)

Join

Join

PRODUCT STORESALES_FACT

Group By & Aggregatebrand store_hsk:area

brandGroup By: monthbrand store_hsk:areadate_hsk:month

s=SUM(x)

areabrands

hsk:areaGroup By & Aggregate

hsk:area area

areaGroup By & Aggregate

prod_hsk store_hsk:area

brandprod_hsk store_hsk:areadate_hsk:month

x=SUM(sales)

Page 11: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Hierarchical Pre-Grouping Classification of the effects:

Remove a join with a dimension table (DATE) Postpone a join for after the grouping operation

(STORE) Introduce an additional grouping operation before all

joins thus creating a two-stage grouping process (PRODUCT)

Experimental results show an important impact:

Reduces response time by more than 50% - 75% (Karayannidis et al. [VLDB’02], Pieringer et al. [ICDE’03])

Page 12: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Motivating questions Can Pre-Grouping be applied to other database

schemata without h-surrogates?

What are the precise conditions required to apply the transformations done by Pre-Grouping?

Is Pre-Grouping a combination of known optimization techniques or does it introduce some novelty?

Page 13: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Main results Define the Generalized Pre-Grouping as an

algebraic transformation: E1=E2 Using Select (), Cross-product () and Generalized

Projection (Л) operators Decompose Pre-Grouping into a sequence of more

simple transformations Analyze the relationship between Pre-Groupings and

other known transformations Clarify which transformations use semantic information

(I.C.) Establish the importance of the Surrogate-Join

transformation Identify sufficient conditions for applying the

various transformations Use relations with bag semantics and NULL values

Page 14: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

SuSu

SuGroup By & Aggregate

A=F(Ag) Su

Sd3

Generalized Pre-Grouping (1)

Join

…Hu1 SHu1 Hu2 SHu2 Hu3 SHu3

Sd3 SKd3 …Kd3Rd3

Ru

Case #1: Remove redundant join

Hu3

Kd3

Sd3SHu3

Sd3

I1: Sd3SKd3

I2: {Kd3,SKd3}{Hu3,SHu3}

I1

I3: Kd3 key of Rd3

Kd3

I3

I2Hu3 SHu3SHu3

Su

Page 15: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

SuSu

SuGroup By & Aggregate

A=F(Ag) Su

Sd2

Generalized Pre-Grouping (2)

Join

…Hu1 SHu1 Hu2 SHu2 Hu3 SHu3

Sd2 SKd2 …Kd2Rd2

Ru

Case #2: Delay a join

Hu2

Kd2

Sd2SHu2

Sd2

I6: Sd2SKd2

I4: {Kd2,SKd2}{Hu2,SHu2}

I6

I5: Kd2 key of Rd2

Kd2

I5

SHu2I4

Hu2 SHu2

Su

Sd2SHu2

Join

SuA

Group By & AggregateSd2 SKd2

Sd3SKd2

Sd2

Page 16: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Generalized Pre-Grouping (3)

Join

Group By & AggregateSu Sd1

Sd1Su

…Hu1 SHu1 Hu2 SHu2 Hu3 SHu3 Su Sd1 SKd1 …Kd1Ru Rd1

Case #3: Split aggregation into two stages

SKd1Group By & Aggregate

Sd1 SKd1

A=F(Ag)A=F(AgO(x))

Group By & Aggregatex=F(Ag) Su SHu1

SHu1Su

I8: {Kd1,SKd1}{Hu1,SHu1}

I7: Ag(z)=Ago(Ag(z))

Kd1

I9: Kd1 key of Rd1

SKd1Sd1

Page 17: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Generalized Pre-Grouping The combination of all three cases define the

Generalized Pre-Grouping The decomposition proves:

A set of sufficient conditions for applying the Generalized Pre-Grouping transformation

The relationship to other known transformations The usage of semantic information

The Generalized Pre-Grouping uses Surrogate-Join to modify the join conditions

Page 18: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Surrogate-Join transformation

Join

Join

A B

Group By & AggregateSK

B SKSH A

SK B PKH SH A O

A B

H SH A O

, 1 2

, , 1 , 2

( )

( ( ) ( ))

A B H K

SKA B SH SK A SH B SK

R × R

R R

Л

Л Л Л

SK B PK

R1 R2

R1

R2

K

Page 19: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Surrogate-Join example

Join

PageID PageHits ServerID ServerHits

Page_Server_Hits

PageID ServerID Hour HourPageHits

PageID HourPageHits/ServerHitsHour

Page_Hour_Hits

Group By & AggregateServerID

ServerID ServerHits

SELECT h.PageID, h.Hour, h.HourPageHits/s.ServerHitsFROM (SELECT DISTINCT ServerID, ServerHits

FROM Page_Server_Hits) s, Page_Hour_Hits hWHERE s.ServerID=h.ServerID

SELECT h.PageID, h.Hour, h.HourPageHits/s.ServerHitsFROM Page_Server_Hits s, Page_Hour_Hits hWHERE s.PageID=h.PageID

Page 20: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Bad news: Surrogate-Join can be described as a conjunctive query transformation

H SH A OR1

Join

A B

SK BP KR2

SK BP KR21SK

Join

R22

Page 21: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Bad news: Surrogate-Join can be described as a conjunctive query transformation

H SH A OR1

Join

A B

SK BP KR21SK

Join

R22

H=K & SH=SK

R1 R21

R22

H=K & SH=SK

SKSH=SK

SKK SK

Page 22: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Bad news: Surrogate-Join can be described as a conjunctive query transformation

H SH A OR1

Join

A B

SK BP K SK R22

SH=SK

SKK SKR21

Page 23: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Conclusions The Pre-Grouping transformation is a mixture

of known and new transformations The Generalized Pre-Grouping can be applied

in the absence of h-surrogates using only SQL integrity constraints

The Surrogate-Join transformation is an important ingredient of Pre-Grouping. It exploits functional and inclusion dependencies

Semantic Query Optimization techniques are particularly effective in the DW & OLAP areas

Page 24: The Generalized Pre-Grouping Transformation: Aggregate Query Optimization in the Presence of Dependencies Aris TsoisTimos Sellis Knowledge and Database

A. TsoisNTUA, 2003

Contact

S P Q G1 J C HG3G2

Aris TsoisKnowledge and Database Systems LaboratoryNational Technical University of Athens, Hellas

e-mail: [email protected]: http://www.dblab.ece.ntua.gr/~atsois/

Long version (TR-2003-4) available at:http://www.dblab.ece.ntua.gr/publications/TR-2003-4.pdf


Recommended