104
Promotion Analysis in Multi- Dimensional Space VLDB 2009 Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1 1 University of Illinois, Urbana-Champaign, Urbana, IL, USA 2 Microsoft Research, Redmond, WA, USA Presenter : Chun Kit Chui (Kit) Supervisor : Dr. Ben Kao

Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Embed Size (px)

Citation preview

Page 2: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Content

Introduction Promotion analysis problem

Problem definition Promotiveness measure

The basic query execution framework Subspace pruning Object pruning Promotion cube

Experimental evaluations Conclusion

Page 3: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Introduction

Page 4: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Introduction

Promotion has been playing a key role in marketing…

Retailer Category Readership Year #Sales

A Sci & Tech College students 2009 20

A Sci & Tech University students 2009 5

A Comedy University students 2009 9

B Sci & Tech College students 2009 20

B Sci & Tech University students 2009 7

B Fiction University students 2009 5

B Comedy Kindergarten 2009 20

B Comedy College students 2009 10

C Sci & Tech College students 2009 12

… … … … …

A Sci & Tech College students 2010 22

A Sci & Tech University students 2010 4

A Comedy College students 2010 1

B Sci & Tech College students 2010 13

B Sci & Tech University students 2010 30

B Fiction University students 2010 5

B Comedy Kindergarten 2010 20

B Comedy College students 2010 10

C Sci & Tech College students 2010 16

C Comedy Kindergarten 2010 52

Book sales database

Page 5: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Introduction

Promotion has been playing a key role in marketing…

Manager of retailer A

Retailer Category Readership Year #Sales

A Sci & Tech College students 2009 20

A Sci & Tech University students 2009 5

A Comedy University students 2009 9

B Sci & Tech College students 2009 20

B Sci & Tech University students 2009 7

B Fiction University students 2009 5

B Comedy Kindergarten 2009 20

B Comedy College students 2009 10

C Sci & Tech College students 2009 12

… … … … …

A Sci & Tech College students 2010 22

A Sci & Tech University students 2010 4

A Comedy College students 2010 1

B Sci & Tech College students 2010 13

B Sci & Tech University students 2010 30

B Fiction University students 2010 5

B Comedy Kindergarten 2010 20

B Comedy College students 2010 10

C Sci & Tech College students 2010 16

C Comedy Kindergarten 2010 52

What is the rank of our book sales among other retailers?

Book sales database

Page 6: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Introduction

Promotion has been playing a key role in marketing…

Manager of retailer A

Retailer Category Readership Year #Sales

A Sci & Tech College students 2009 20

A Sci & Tech University students 2009 5

A Comedy University students 2009 9

B Sci & Tech College students 2009 20

B Sci & Tech University students 2009 7

B Fiction University students 2009 5

B Comedy Kindergarten 2009 20

B Comedy College students 2009 10

C Sci & Tech College students 2009 12

… … … … …

A Sci & Tech College students 2010 22

A Sci & Tech University students 2010 4

A Comedy College students 2010 1

B Sci & Tech College students 2010 13

B Sci & Tech University students 2010 30

B Fiction University students 2010 5

B Comedy Kindergarten 2010 20

B Comedy College students 2010 10

C Sci & Tech College students 2010 16

C Comedy Kindergarten 2010 52

What is the rank of our book sales among other retailers?

We ranked the 3rd among all book retailers !

Retailer #Sales

A 61

B 180

C 80

Book sales database

Global aggregate result

E.g. To compute the aggregate value of this cell, we project all tuples with Retailer = “A” and sum up their sales.

Page 7: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

We ranked the 3rd among all book retailers !

What is the rank of our book sales among other retailers?

Introduction

Promotion has been playing a key role in marketing…

Manager of retailer A

Retailer Category Readership Year #Sales

A Sci & Tech College students 2009 20

A Sci & Tech University students 2009 5

A Comedy University students 2009 9

B Sci & Tech College students 2009 20

B Sci & Tech University students 2009 7

B Fiction University students 2009 5

B Comedy Kindergarten 2009 20

B Comedy College students 2009 10

C Sci & Tech College students 2009 12

… … … … …

A Sci & Tech College students 2010 22

A Sci & Tech University students 2010 4

A Comedy College students 2010 1

B Sci & Tech College students 2010 13

B Sci & Tech University students 2010 30

B Fiction University students 2010 5

B Comedy Kindergarten 2010 20

B Comedy College students 2010 10

C Sci & Tech College students 2010 16

C Comedy Kindergarten 2010 52

Retailer #Sales

A 61

B 180

C 80

Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.

Book sales database

Page 8: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

We ranked the 3rd among all book retailers !

What is the rank of our book sales among other retailers?

Introduction

Promotion has been playing a key role in marketing…

Manager of retailer A

Retailer Category Readership Year #Sales

A Sci & Tech College students 2009 20

A Sci & Tech University students 2009 5

A Comedy University students 2009 9

B Sci & Tech College students 2009 20

B Sci & Tech University students 2009 7

B Fiction University students 2009 5

B Comedy Kindergarten 2009 20

B Comedy College students 2009 10

C Sci & Tech College students 2009 12

… … … … …

A Sci & Tech College students 2010 22

A Sci & Tech University students 2010 4

A Comedy College students 2010 1

B Sci & Tech College students 2010 13

B Sci & Tech University students 2010 30

B Fiction University students 2010 5

B Comedy Kindergarten 2010 20

B Comedy College students 2010 10

C Sci & Tech College students 2010 16

C Comedy Kindergarten 2010 52

Retailer #Sales

A 61

B 180

C 80

Retailer Category Readership #Sales

A Sci & Tech College students 42

Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.

Book sales database

Page 9: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

We ranked the 3rd among all book retailers !

What is the rank of our book sales among other retailers?

Introduction

Promotion has been playing a key role in marketing…

Manager of retailer A

Retailer Category Readership Year #Sales

A Sci & Tech College students 2009 20

A Sci & Tech University students 2009 5

A Comedy University students 2009 9

B Sci & Tech College students 2009 20

B Sci & Tech University students 2009 7

B Fiction University students 2009 5

B Comedy Kindergarten 2009 20

B Comedy College students 2009 10

C Sci & Tech College students 2009 12

… … … … …

A Sci & Tech College students 2010 22

A Sci & Tech University students 2010 4

A Comedy College students 2010 1

B Sci & Tech College students 2010 13

B Sci & Tech University students 2010 30

B Fiction University students 2010 5

B Comedy Kindergarten 2010 20

B Comedy College students 2010 10

C Sci & Tech College students 2010 16

C Comedy Kindergarten 2010 52

Retailer #Sales

A 61

B 180

C 80

Retailer Category Readership #Sales

A Sci & Tech College students 42

B Sci & Tech College students 33

Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.

Book sales database

Page 10: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

We ranked the 3rd among all book retailers !

What is the rank of our book sales among other retailers?

Introduction

Promotion has been playing a key role in marketing…

Manager of retailer A

Retailer Category Readership Year #Sales

A Sci & Tech College students 2009 20

A Sci & Tech University students 2009 5

A Comedy University students 2009 9

B Sci & Tech College students 2009 20

B Sci & Tech University students 2009 7

B Fiction University students 2009 5

B Comedy Kindergarten 2009 20

B Comedy College students 2009 10

C Sci & Tech College students 2009 12

… … … … …

A Sci & Tech College students 2010 22

A Sci & Tech University students 2010 4

A Comedy College students 2010 1

B Sci & Tech College students 2010 13

B Sci & Tech University students 2010 30

B Fiction University students 2010 5

B Comedy Kindergarten 2010 20

B Comedy College students 2010 10

C Sci & Tech College students 2010 16

C Comedy Kindergarten 2010 52

Retailer #Sales

A 61

B 180

C 80

Retailer Category Readership #Sales

A Sci & Tech College students 42

B Sci & Tech College students 33

C Sci & Tech College students 28

Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.

We are the top-1 bookseller in the { Readership = College Students, Category = Sci & Tech } segment !!!

Book sales database

Page 11: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

We ranked the 3rd among all book retailers !

What is the rank of our book sales among other retailers?

We are the top-1 bookseller in the { Readership = College Students, Category = Sci & Tech } segment !!!

Introduction

Promotion has been playing a key role in marketing…

Manager of retailer A

Retailer Category Readership Year #Sales

A Sci & Tech College students 2009 20

A Sci & Tech University students 2009 5

A Comedy University students 2009 9

B Sci & Tech College students 2009 20

B Sci & Tech University students 2009 7

B Fiction University students 2009 5

B Comedy Kindergarten 2009 20

B Comedy College students 2009 10

C Sci & Tech College students 2009 12

… … … … …

A Sci & Tech College students 2010 22

A Sci & Tech University students 2010 4

A Comedy College students 2010 1

B Sci & Tech College students 2010 13

B Sci & Tech University students 2010 30

B Fiction University students 2010 5

B Comedy Kindergarten 2010 20

B Comedy College students 2010 10

C Sci & Tech College students 2010 16

C Comedy Kindergarten 2010 52

Retailer #Sales

A 61

B 180

C 80

Retailer Category Readership #Sales

A Sci & Tech College students 42

B Sci & Tech College students 33

C Sci & Tech College students 28

Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.

Full space

Subspaces

Global rank

Local rank

May not be interesting.

Globally low-ranked object may becomes prominent in some subspaces.

Compare with ALL objects in ALL aspects.

Compare with objects in certain area.

Low cost

High cost

Single SQL.

A naïve approach is to compute rank for ALL possible subspaces and return the interesting ones.

Promotion query

Page 12: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

We ranked the 3rd among all book retailers !

What is the rank of our book sales among other retailers?

We are the top-1 bookseller in the { Readership = College Students, Category = Sci & Tech } segment !!!

Introduction

Promotion has been playing a key role in marketing…

Manager of retailer A

Retailer Category Readership Year #Sales

A Sci & Tech College students 2009 20

A Sci & Tech University students 2009 5

A Comedy University students 2009 9

B Sci & Tech College students 2009 20

B Sci & Tech University students 2009 7

B Fiction University students 2009 5

B Comedy Kindergarten 2009 20

B Comedy College students 2009 10

C Sci & Tech College students 2009 12

… … … … …

A Sci & Tech College students 2010 22

A Sci & Tech University students 2010 4

A Comedy College students 2010 1

B Sci & Tech College students 2010 13

B Sci & Tech University students 2010 30

B Fiction University students 2010 5

B Comedy Kindergarten 2010 20

B Comedy College students 2010 10

C Sci & Tech College students 2010 16

C Comedy Kindergarten 2010 52

Retailer #Sales

A 61

B 180

C 80

Retailer Category Readership #Sales

A Sci & Tech College students 42

B Sci & Tech College students 33

C Sci & Tech College students 28

Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.

Object dimension

Subspace dimensions

Scoredimension

Target object

Page 13: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Introduction

Person PromotionAn NBA manager would like to promote

Michael Jordan as a superstar.3rd all time leading scorer.Further analysis…

Top scorer in the guard position. Top scorer on the Chicago Bulls team. 11 individual years’ scoring champion.

Player Position Team Year Game … Score

Michael Jordan

Guard Chicago Bulls

1998 vs N.Y. Knicks

… 33

Michael Jordan

Guard Chicago Bulls

1998 vs Utah Jazz

… 15

Scottie Pippen

Small Forward

Chicago Bulls

1998 vs Utah Jazz

… 18

… … … … … … …

Target object

Object dimension

Subspace dimensions

Scoredimension

Local rank in some subspaces

Page 14: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Introduction

The promotion query problemGiven an object (e.g. a product, a person)Goal: Discover the most interesting

subspaces where the object is highly ranked.

Page 15: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Problem Definition

Promotiveness measure

Page 16: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Problem Definition Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is

highly ranked.

Object dimension

Subspace dimensions

Scoredimension

Page 17: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Problem Definition

Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is

highly ranked.

{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}

{2007}

{NY,2007} {WA,2007}

Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

All possible subspaces.

Page 18: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Problem Definition Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is

highly ranked.

{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}

{2007}

{NY,2007} {WA,2007}

Note that the target object T1 only appears in year = 2008, therefore the subspace {2007} can be pruned.

Page 19: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Problem Definition Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is

highly ranked.

{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}

Target subspaces of T1.

Page 20: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Problem Definition Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is

highly ranked.

{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object SUM(Score)

T1 0.5 + 0.8 = 1.3

T2 1 + 1 = 2

T3 0.3 + 0.6 + 0.7 = 1.6

We project all tuples of T1 into this cell and sum up their scores.

Page 21: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Problem Definition Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is

highly ranked.

{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

SUM (T1) = 1.3Rank (T1) = 1st / 3

Object SUM(Score)

T1 0.5 + 0.8 = 1.3

T2 1 + 1 = 2

T3 0.3 + 0.6 + 0.7 = 1.6

Object Year SUM(Score)

T1 2008 0.5 + 0.8 = 1.3

T2 2008 1

T3 2008 0.7

We project all tuples of T1 with Year = “2008” into this cell and sum up their scores.

Page 22: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Problem Definition Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is

highly ranked.

{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 1

Object SUM(Score)

T1 0.5 + 0.8 = 1.3

T2 1 + 1 = 2

T3 0.3 + 0.6 + 0.7 = 1.6

Object Year SUM(Score)

T1 2008 0.5 + 0.8 = 1.3

T2 2008 1

T3 2008 0.7

Object Location Year SUM(Score)

T1 NY 2008 0.5

T2 NY 2008 NO Tuples !

T3 NY 2008 NO Tuples !

We project all tuples of T1 with Location = “NY” and Year = “2008” into this cell and sum up their scores.

SUM (T1) = 1.3Rank (T1) = 1st / 3

Page 23: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Problem Definition Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is

highly ranked.

{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

SUM (T1) = 1.3Rank (T1) = 1st / 3

SUM (T1) = 0.5Rank (T1) = 1st / 1

Object SUM(Score)

T1 0.5 + 0.8 = 1.3

T2 1 + 1 = 2

T3 0.3 + 0.6 + 0.7 = 1.6

Object Year SUM(Score)

T1 2008 0.5 + 0.8 = 1.3

T2 2008 1

T3 2008 0.7

Object Location Year SUM(Score)

T1 NY 2008 0.5

T2 NY 2008 NO Tuples !

T3 NY 2008 NO Tuples !

T1 ranks 1st in both {2008} and {NY,2008}, which one is more interesting?

Page 24: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Problem Definition

Promotiveness of a subspace S : a class of measures to quantify how well a subspace S can promote the target object T. Rank of the target object, Rank(S,T)

Higher rank -> more promotive.

Significance of the subspace, Sig(S) More significant subspace (e.g. more objects) -> more promotive.

P(S, T) = f( Rank(S, T) ) * g( Sig(S) ) Example

P(S,T) = Rank-1(S,T) P(S,T) = Rank-1(S,T) * ObjectCount(S) P(S,T) = Rank-1(S,T) * I(ObjectCount(S) > MinSig)

Page 25: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Problem Definition

The promotion query problem Input

a target object T a top-R parameter

Output top-R subspaces with the largest P(S, T) scores

Assume simple ranking model P(S,T) = Rank-1(S,T)

Page 26: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query processing methods

Page 27: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

Partition Aggregation

Start

Page 28: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

{*} SUM (T1) = 1.3Rank (T1) = 3rd / 3

Partition Aggregation

Start

Start from the coarsest subspace {*}.

Page 29: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

{*} SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2007 1.0

T2 WA 2008 1.0

T3 NY 2007 0.3

T3 WA 2007 0.6

T3 WA 2008 0.7

{NY} {WA}

Partition Aggregation

Start

Partition the data based on the first dimension (i.e. Location). Generate candidate subspaces by substituting values in that dimension.

Page 30: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

{*}

{NY} {WA}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T2 WA 2008 1.0

T1 WA 2008 0.8

T3 WA 2007 0.6

T3 WA 2008 0.7

Partition Aggregation

Start

Partition the data based on the first dimension (i.e. Location). Generate candidate subspaces by substituting values in that dimension.

Page 31: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T2 WA 2008 1.0

T1 WA 2008 0.8

T3 WA 2007 0.6

T3 WA 2008 0.7

SUM (T1) = 0.5Rank (T1) = 1st / 2

Recursive operate on the child subspace, perform aggregation.T1 ranks 1st among two objects (i.e. T1 and T3).

Page 32: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T2 WA 2008 1.0

T1 WA 2008 0.8

T3 WA 2007 0.6

T3 WA 2008 0.7

SUM (T1) = 0.5Rank (T1) = 1st / 2

{NY,2008}{NY,2007}

Partition the data based on the next dimension (i.e. Year). Generate candidate subspaces by substituting values in that dimension.

Page 33: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T2 WA 2008 1.0

T1 WA 2008 0.8

T3 WA 2007 0.6

T3 WA 2008 0.7

SUM (T1) = 0.5Rank (T1) = 1st / 2

{NY,2008}{NY,2007}

Pruned!!!

Recursive operate on the child subspace.The target object T1 does not appear in this subspace, prune it!!!

Page 34: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T2 WA 2008 1.0

T1 WA 2008 0.8

T3 WA 2007 0.6

T3 WA 2008 0.7

SUM (T1) = 0.5Rank (T1) = 1st / 2

{NY,2008}{NY,2007}

Pruned!!! SUM (T1) = 0.5Rank (T1) = 1st / 1

Page 35: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T2 WA 2008 1.0

T1 WA 2008 0.8

T3 WA 2007 0.6

T3 WA 2008 0.7

SUM (T1) = 0.5Rank (T1) = 1st / 2

{NY,2008}{NY,2007}

Pruned!!! SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 3rd / 3

Page 36: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T2 WA 2008 1.0

T1 WA 2008 0.8

T3 WA 2007 0.6

T3 WA 2008 0.7

SUM (T1) = 0.5Rank (T1) = 1st / 2

{NY,2008}{NY,2007}

Pruned!!! SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 3rd / 3

{WA,2008}{WA,2007}

Page 37: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T3 WA 2007 0.6

T1 WA 2008 0.8

T2 WA 2008 1.0

T3 WA 2008 0.7

SUM (T1) = 0.5Rank (T1) = 1st / 2

{NY,2008}{NY,2007}

Pruned!!! SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 3rd / 3

{WA,2008}{WA,2007}

Page 38: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T3 WA 2007 0.6

T1 WA 2008 0.8

T2 WA 2008 1.0

T3 WA 2008 0.7

SUM (T1) = 0.5Rank (T1) = 1st / 2

{NY,2008}{NY,2007}

Pruned!!! SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 3rd / 3

{WA,2008}{WA,2007}

Pruned!!!

Page 39: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T3 WA 2007 0.6

T1 WA 2008 0.8

T2 WA 2008 1.0

T3 WA 2008 0.7

SUM (T1) = 0.5Rank (T1) = 1st / 2

{NY,2008}{NY,2007}

Pruned!!! SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 3rd / 3

{WA,2008}{WA,2007}

Pruned!!! SUM (T1) = 0.8Rank (T1) = 2nd / 3

Page 40: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

{NY,2008} {WA,2008}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 2

SUM (T1) = 0.8Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 2nd / 3

{NY,2007}

Pruned!!!

{WA,2007}

Pruned!!!

Object Location Year Score

T1 NY 2008 0.5

T3 NY 2007 0.3

T2 WA 2007 1.0

T3 WA 2007 0.6

T1 WA 2008 0.8

T2 WA 2008 1.0

T3 WA 2008 0.7

{2007} {2008}

Page 41: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

{NY,2008} {WA,2008}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 2

SUM (T1) = 0.8Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 2nd / 3

{NY,2007}

Pruned!!!

{WA,2007}

Pruned!!!

Object Location Year Score

T3 WA 2007 0.6

T3 NY 2007 0.3

T2 WA 2007 1.0

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2008 1.0

T3 WA 2008 0.7

{2007} {2008}

Page 42: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA}

{NY,2008} {WA,2008}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 2

SUM (T1) = 0.8Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 2nd / 3

{NY,2007}

Pruned!!!

{WA,2007}

Pruned!!!

{2007} {2008}

Object Location Year Score

T3 WA 2007 0.6

T3 NY 2007 0.3

T2 WA 2007 1.0

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2008 1.0

T3 WA 2008 0.7

Pruned!!!

Page 43: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 2

SUM (T1) = 0.8Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 2nd / 3

{NY,2007}

Pruned!!!

{WA,2007}

Pruned!!!

{2007}

Pruned!!!

Object Location Year Score

T3 WA 2007 0.6

T3 NY 2007 0.3

T2 WA 2007 1.0

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2008 1.0

T3 WA 2008 0.7

SUM (T1) = 1.3Rank (T1) = 1st / 3

Finish!!!

Page 44: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 2

SUM (T1) = 0.8Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 2nd / 3

{NY,2007}

Pruned!!!

{WA,2007}

Pruned!!!

{2007}

Pruned!!!

Object Location Year Score

T3 WA 2007 0.6

T3 NY 2007 0.3

T2 WA 2007 1.0

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2008 1.0

T3 WA 2008 0.7

SUM (T1) = 1.3Rank (T1) = 1st / 3

P(S,T) = Rank-1(S,T)

Return Top-3 subspaces

Page 45: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework Basic framework

To use a recursive process to partition and aggregate the data to compute the target object’s rank in each subspace.

Partition Aggregation

Start

{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}

SUM (T1) = 1.3Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 2

SUM (T1) = 0.8Rank (T1) = 3rd / 3

SUM (T1) = 0.5Rank (T1) = 1st / 1

SUM (T1) = 0.8Rank (T1) = 2nd / 3

{NY,2007}

Pruned!!!

{WA,2007}

Pruned!!!

{2007}

Pruned!!!

Object Location Year Score

T3 WA 2007 0.6

T3 NY 2007 0.3

T2 WA 2007 1.0

T1 NY 2008 0.5

T1 WA 2008 0.8

T2 WA 2008 1.0

T3 WA 2008 0.7

SUM (T1) = 1.3Rank (T1) = 1st / 3

P(S,T) = Rank-1(S,T) * I(ObjCount(S) > 1)

Return Top-3 subspaces

Page 46: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework

Page 47: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework

The basic execution framework…Computes ALL subspaces, and thus the

overall cost could be quite prohibitive for large datasets.

Develop optimization techniques based on thresholding techniquesSubspace pruningObject pruning

Page 48: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Efficient computation methods Subspace pruning

Object pruning

Page 49: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace pruning

Key motivation If the upper bound of the promotiveness score of an

unseen subspace is lower than the current top-R promotiveness score, we can prune the subspace.

How to obtain the upper bound of promotiveness scores of the unseen subspaces? P(S,T) = Rank-1(S,T) Obtain a lower bound of the rank.

Page 50: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace pruning

Assumption : The aggregation measure is a

monotone function (e.g. SUM) The Sig measure is also monotone.

Key observations

{*}

{A} {B} {C}

{AB}

{ABC}

{AC} {BC}

Observation 1: Objects in the child subspace must be a member of its parent subspace.

Observation 2: Object’s aggregate score in the child subspace must be smaller than or equal to its parent subspace.

Page 51: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a}

S2 = {ab}

S3 = {abc}

S4 = {ac}

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Page 52: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} 0.7

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Initialization step: We first scan the dataset once to calculate the aggregate of the target object t7 in each subspace.

Page 53: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Start: Compute the aggregate of objects in subspace {a}.

Page 54: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S1(1/3)

Page 55: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S1(1/3)

Page 56: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S1(1/3) S2(1/2)

Page 57: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S2(1/2)

Page 58: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S2(1/2)

Page 59: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S2(1/2) Can we deduce a lower bound of Rank of S4?

Page 60: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S2(1/2) Can we deduce a lower bound of Rank of S4?

Given the tuples in S3, can we deduce some of the members of S4?

Page 61: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3

S4 = {ac} t3(?) t6(?) t7(?) t1(?) t5(?) 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Can we deduce a lower bound of Rank of S4?

Can we say something about the scores of these members?

Current top-1 result : S2(1/2)Tuples in the child subspace must also appear in the parent subspaces.

Page 62: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3

S4 = {ac} t3(?) t6(?) t7(0.4) t1(?) t5(?) 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Can we deduce a lower bound of Rank of S4?

Can we say something about the scores of these members?

Current top-1 result : S2(1/2)

Page 63: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3

S4 = {ac} t3(0.6) t6(0.5) t7(0.4) t1(0.1) t5(0.1) 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Can we deduce a lower bound of Rank of S4?

Can we say something about the scores of these members?

Current top-1 result : S2(1/2)Aggregate score of a tuple in the child subspace must be smaller or equal to its score in the parent subspace.

Page 64: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3

S4 = {ac} t3(0.6) t6(0.5) t7(0.4) t1(0.1) t5(0.1) 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Can we deduce a lower bound of Rank of S4?Current top-1 result : S2(1/2)

?

Page 65: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3

S4 = {ac} t3(0.6) t6(0.5) t7(0.4) t1(0.1) t5(0.1) 0.4 >= 3

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Can we deduce a lower bound of Rank of S4?Current top-1 result : S2(1/2)

Page 66: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t7(0.7) t1(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t7(0.6) t3(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3

S4 = {ac} t3(0.6) t6(0.5) t7(0.4) t1(0.1) t5(0.1) 0.4 >= 3 ?

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

S4 Pruned!!!

Current top-1 result : S2(1/2)The promotive score of S4 should be less than or equal to 1/3, which is less than the current top-1 promotive score (1/2), so S4 can be pruned!!!!

Page 67: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace pruning

Page 68: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Object pruning

Key motivation Try to prune the objects by obtaining an

upper bound of the aggregate score of unseen objects.

Unseen objects with upper bound smaller than the smallest aggregate score of target object can be pruned.

Page 69: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Object pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} 0.7

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Minimum of the aggregate scores of t7: = min{0.7, 0.6, 0.3, 0.4} = 0.3

Page 70: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Object pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t1(0.7) t7(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Minimum of the aggregate scores of t7: = min{0.7, 0.6, 0.3, 0.4} = 0.3

Question: Can we prune some objects in the subtree of S1?

Page 71: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Object pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t1(0.7) t7(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Minimum of the aggregate scores of t7: = min{0.7, 0.6, 0.3, 0.4} = 0.3

Question: Can we prune some objects in the subtree of S1?

0.2 is the upper bound of the aggregate scores of t2 in the subtree of S1. i.e. the aggregate score will only be <= 0.2 !!!

Page 72: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Object pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t1(0.7) t7(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Minimum of the aggregate scores of t7: = min{0.7, 0.6, 0.3, 0.4} = 0.3

Since the minimum of the aggregate scores of t7 is 0.3, the aggregate scores of t2 will not affect the rank of t7 in the subtree of S1.

0.2 is the upper bound of the aggregate scores of t2 in the subtree of S1. i.e. the aggregate score will only be <= 0.2 !!!

Page 73: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Object pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t1(0.7) t7(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Minimum of the aggregate scores of t7: = min{0.7, 0.6, 0.3, 0.4} = 0.3

Since the minimum of the aggregate scores of t7 is 0.3, the aggregate scores of t2 will not affect the rank of t7 in the subtree of S1.

Pruned!!!

0.2 is the upper bound of the aggregate scores of t2 in the subtree of S1. i.e. the aggregate score will only be <= 0.2 !!!

Page 74: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Object pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t1(0.7) t7(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Minimum of the aggregate scores of t7: = min{0.7, 0.6, 0.3, 0.4} = 0.3

Similarly, t4, t5 can also be Pruned!!!

Page 75: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Object pruning

Page 76: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Promotion cube

Page 77: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Promotion cube

Promotion cellGiven a subspace S, a promotion cell S.Pcell is

defined as the sequence of the top-k largest object aggregate scores in S.

Promotion cubeThe promotion cube consists of a set of triples

in the format (S, S.Pcell, Sig), where Sig is the significance of the subspace S.

Page 78: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Promotion cubeS1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Subspace Top-3 largest aggregate scores in the subspace

Sig

S1 = {a} (1.2) (1.0) (0.7) 1

S2 = {ab} (0.7) (0.6) (0.6) 1

S3 = {abc} (0.6) (0.5) (0.3) 1

S4 = {ac} (0.8) (0.7) (0.6) 1

In the promotion cube, we precompute the top-3 largest aggregate scores (not the object) in each subsapce.

Page 79: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Promotion cube

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} 0.7

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Subspace Top-3 largest aggregate scores in the subspace

Sig

S1 = {a} (1.2) (1.0) (0.7) 1

S2 = {ab} (0.7) (0.6) (0.6) 1

S3 = {abc} (0.6) (0.5) (0.3) 1

S4 = {ac} (0.8) (0.7) (0.6) 1

In the promotion cube, we precompute the top-3 largest aggregate scores (not the object) in each subsapce.

Page 80: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace Top-3 largest aggregate scores in the subspace

Sig

S1 = {a} (1.2) (1.0) (0.7) 1

S2 = {ab} (0.7) (0.6) (0.6) 1

S3 = {abc} (0.6) (0.5) (0.3) 1

S4 = {ac} (0.8) (0.7) (0.6) 1

Promotion cube

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} 0.7

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Can you tell the exact rank of t7 in S1? The aggregate score of t7 is 0.7, there are 2 other objects with aggregate value larger than t7!

^

Page 81: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace Top-3 largest aggregate scores in the subspace

Sig

S1 = {a} (1.2) (1.0) (0.7) 1

S2 = {ab} (0.7) (0.6) (0.6) 1

S3 = {abc} (0.6) (0.5) (0.3) 1

S4 = {ac} (0.8) (0.7) (0.6) 1

Promotion cube

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} No need to compute 0.7 3 1/3

S2 = {ab} 0.6

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S1(1/3)

Can you tell the exact rank of t7 in S1? The aggregate score of t7 is 0.7, there are 2 other objects with aggregate value larger than t7!

^

Page 82: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace Top-3 largest aggregate scores in the subspace

Sig

S1 = {a} (1.2) (1.0) (0.7) 1

S2 = {ab} (0.7) (0.6) (0.6) 1

S3 = {abc} (0.6) (0.5) (0.3) 1

S4 = {ac} (0.8) (0.7) (0.6) 1

Promotion cube

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} No need to compute 0.7 3 1/3

S2 = {ab} No need to compute 0.6 2 1/2

S3 = {abc} 0.3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S2(1/2)

^

Page 83: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace Top-3 largest aggregate scores in the subspace

Sig

S1 = {a} (1.2) (1.0) (0.7) 1

S2 = {ab} (0.7) (0.6) (0.6) 1

S3 = {abc} (0.6) (0.5) (0.3) 1

S4 = {ac} (0.8) (0.7) (0.6) 1

Promotion cube

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} No need to compute 0.7 3 1/3

S2 = {ab} No need to compute 0.6 2 1/2

S3 = {abc} No need to compute 0.3 3 1/3

S4 = {ac} 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S2(1/2)^

Page 84: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace Top-3 largest aggregate scores in the subspace

Sig

S1 = {a} (1.2) (1.0) (0.7) 1

S2 = {ab} (0.7) (0.6) (0.6) 1

S3 = {abc} (0.6) (0.5) (0.3) 1

S4 = {ac} (0.8) (0.7) (0.6) 1

Promotion cube

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} No need to compute 0.7 3 1/3

S2 = {ab} No need to compute 0.6 2 1/2

S3 = {abc} No need to compute 0.3 3 1/3

S4 = {ac} No need to compute 0.4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Current top-1 result : S2(1/2)

Can you tell the exact rank of t7 in S4? No! The aggregate score of t7 is 0.4, there are at least 3 objects with aggregate value larger than t7!

^

Page 85: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Subspace Top-3 largest aggregate scores in the subspace

Sig

S1 = {a} (1.2) (1.0) (0.7) 1

S2 = {ab} (0.7) (0.6) (0.6) 1

S3 = {abc} (0.6) (0.5) (0.3) 1

S4 = {ac} (0.8) (0.7) (0.6) 1

Promotion cube

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} No need to compute 0.7 3 1/3

S2 = {ab} No need to compute 0.6 2 1/2

S3 = {abc} No need to compute 0.3 3 1/3

S4 = {ac} No need to compute 0.4 >=3

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

S4 Pruned!!!

Current top-1 result : S2(1/2)

Can you tell the exact rank of t7 in S4? No!The aggregate score of t7 is 0.4, there are at least 3 objects with aggregate value larger than t7!

Page 86: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Experimental evaluations

Page 87: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Settings

ImplementationPentium 3GHz processor2GB of memory160G hard diskWinXP/ Microsoft Visual C# 2008 (in-memory)

Dataset DBLP DatasetTPC-H

Page 88: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Algorithms

PromoRankThe basic query execution framework.

PromoRank++The basic query execution framework

with subspace pruning and object pruning.

PromoCube

Page 89: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

DBLP Dataset

Subspace dimensions Conference (2,506) Year (50) Database (boolean) Data mining (boolean) Information retrieval (boolean) Machine learning (boolean)

Object dimension: Author(450K) Score dimension: Paper count Base tuples : 1.76M

Page 90: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

DBLP DatasetThe running time increases as R increases. It is because the pruning threshold is determined by the current top-R’s aggregate score. The threshold becomes looser as R becomes larger.

PromoCube performs extremely well when R is small.It is because in such case, the PromoCube can directly return the result using O(1) lookup time.

Page 91: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

DBLP DatasetN

um

be

r o

f su

bsp

ace

ag

gre

ga

tion

s

PromoCube performs extremely well when R is small.It is because in such case, the PromoCube can directly return the result using O(1) lookup time.

The running time increases as R increases. It is because the pruning threshold is determined by the current top-R’s aggregate score. The threshold becomes looser as R becomes larger.

Page 92: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

TPC-H Dataset

Subspace dimensions l_shipdate (2526) l_quantity (50) l_discount (11) l_tax (9) l_linenumber (7) l_returnflag (3)

Object dimension: l suppkey (10,000) Score dimension: l_extendedprice (ranges from 901.00

to 104949.50) Base tuples: 6,001,215

Page 93: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

TPC-H

The gap between PromoRank and PromoRank++ is not large when number of dimensions is small.This is because the total number of target subspace itself is quite small, less chance to perform pruning that exploit parent-child relationship.

PromoCube is increasingly faster w.r.t. number of tuples.

This is because the actual aggregation and partition cost saving of PromoCube is much larger.PromoCube prunes subspace before any aggregation happens, but PromoCube++ prunes subspaces during aggregation process.

Runtime increases when dimensionality increases.This is because there will be more target subspaces when there are more dimensions.

Page 94: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

TPC-H

All algorithm’s running time is faster when there are more objects.It is because more objects, less number of target subspaces for each object.With other parameters unchanged, if there are more objects, each object will appear in less tuples, causing less number of target subspaces for each object .

Both PromoRank++ and PromoCube favor large cardinalities, because… With other parameters unchanged, larger cardinality

implies more subspaces. With the same number of tuples, the chance of two

tuples having the same dimension values becomes lower.

Therefore, it is more likely that the aggregate scores would be equal across parent-child subspaces, thereby providing a tighter lower bound for Rank.

Page 95: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

TPC-H{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}{NY,2007} {WA,2007}

{2007}

{*}

{NY}

{NY,2007}

{2007}

{NY,2008}

{2008}

Both PromoRank++ and PromoCube favor large cardinalities, because… With other parameters unchanged, larger cardinality

implies more subspaces. With the same number of tuples, the chance of two

tuples having the same dimension value becomes lower.

Therefore, it is more likely that the aggregate scores would be equal across parent-child subspaces, thereby providing a tighter lower bound for Rank.

Page 96: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

TPC-H{*}

{NY} {WA} {2008}

{NY,2008} {WA,2008}{NY,2007} {WA,2007}

{2007}

Object Location Year Score

T1 NY 2007 0.6

T2 NY 2008 0.4

Object Location Year Score

T1 NY 2007 0.6

T2 WA 2008 0.4{*}

{NY}

{NY,2007}

{2007}

{NY,2008}

{2008}

Both PromoRank++ and PromoCube favor large cardinalities, because… With other parameters unchanged, larger cardinality

implies more subspaces. With the same number of tuples, the chance of two

objects having the same dimension value becomes lower (sparse).

Therefore, it is more likely that the aggregate scores would be equal across parent-child subspaces, thereby providing a tighter lower bound for Rank.

0.6 0.4

1

0.6 0.4

0.6 0.4

Page 97: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Conclusion

Introduced the promotion analysis problem.

Presented a basic query execution framework.

Proposed two pruning techniques and the Promotion Cube for efficient query processing.

Page 98: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

The End

Page 99: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Appendix

Page 100: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Object pruning

Subspace Objects in the cuboid and their aggregate scores Aggregate of target object

Rank P

S1 = {a} t6(1.2) t3(1.0) t1(0.7) t7(0.7) t4(0.3) t5(0.3) t2(0.2) 0.7 3 1/3

S2 = {ab} t6(0.7) t3(0.6) t7(0.6) t1(0.4) t4(0.3) t2(0.2) t5(0.2) 0.6 2 1/2

S3 = {abc} t3(0.6) t6(0.5) t7(0.3) t1(0.1) t5(0.1) 0.3 3 1/3

S4 = {ac} t3(0.8) t6(0.7) t1(0.6) t7(0.4) t5(0.2) t2(0.1) t4(0.1) 0.4 4 1/4

S1={a}

S2={ab} S4={ac}

S3={abc}

Target object : t7 Return top-1 promotive subspace P(S,t7) = Rank-1(S,t7)

Page 101: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Introduction

Promotion has been playing a key role in marketing…

Manager

Page 102: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Query execution framework{*}

{A} {B}

Basic framework To use a recursive process to

partition and aggregate the data to compute the target object’s rank in each subspace.

Depth-first manner

{C} {D}

{AB}

{ABC}

{ABCD}

{AC} {AD} {BC} {BD} {CD}

{ABD} {ACD} {BCD}

Page 103: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

TPC-H

Page 104: Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of

Effectiveness of promotion query