26
www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior , Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’ 2011 - Seattle, USA

Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

Embed Size (px)

Citation preview

Page 1: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Efficient Processing of Top-k Spatial Preference Queries

João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg

1VLDB’ 2011 - Seattle, USA

Page 2: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Outline

• Top-k spatial preference queries• Current approaches• Our approach– Mapping to distance-score space– Query processing– Materialization (index construction)

• Experimental evaluation• Conclusion

2VLDB’ 2011 - Seattle, USA

Page 3: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Motivation

• Increasing number of Web information systems specialized in location-based queries

• Systems are limited to simple spatial queries – Example: return objects in a given spatial location

• Top-k spatial preference query– Ranks data objects based on the score of feature

objects in their spatial neighborhood– Combines spatial and non-spatial scores

3VLDB’ 2011 - Seattle, USA

Page 4: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Top-k spatial preference queries

4VLDB’ 2011 - Seattle, USA

x

bar caféhotel

p1

c4(0.8)

• Returns – Ranked set of k best data objects

• Score of a data object – Obtained from feature objects

in its spatial neighborhood

• Query– Spatial neighborhood– Features of interest (e.g., bars)

c2(0.4)

c1(0.6)

c3(0.2)

b1(0.9)

b3(0.3)

b2(0.6)

p2

p3

y

• Given a set of data objects and scored feature objects

Top-1

Top-1

Top-1

Page 5: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Score function

• Aggregation of partial scores– Any monotone function: sum, max, and min

• Partial score– Score of a data object for a set of feature objects– Defined by the score of a single feature object• Highest score • Satisfies the spatial constraint

• Spatial constraint– Range, nearest neighbor, and influence

5VLDB’ 2011 - Seattle, USA

Page 6: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Example (agg=sum)

6VLDB’ 2011 - Seattle, USA

Range Nearest neighbor Influence

score(p)=1.5 score(p)=1.0 score(p)=0.6

Page 7: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Current approaches

• Naïve– Compute the score of all objects, select the top-k– Very costly

• State-of-the-art [1,2]– Data objects and feature objects are indexed by

multi-dimensional indices

7VLDB’ 2011 - Seattle, USA

[1] Yiu, M.L., Dai, X., Mamoulis, N., Vaitis, M., : “Top-k spatial preference queries”, ICDE, 2007. [2] Yiu, M.L., Lu, H., Mamoulis, N., Vaitis, M.: “Ranking spatial data by quality preferences”, TKDE, 2011.

Page 8: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Current approaches• Probing algorithms (SP and GP) – Requires computing the score for all objects

• Branch and bound algorithms (BB and BB*)– Compute an upper-bound score for the entries in the

data objects R-tree – Prune entries whose upper-bound score is smaller

than the score of the k-th object found• Feature join algorithm (FJ)– Create combinations of feature sets with high score– Combinations whose score is smaller than the score

of the k-th object found are pruned

8VLDB’ 2011 - Seattle, USA

Page 9: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Motivation behind our idea…

• Few feature objects are necessary to compute the score of a data object– Features not dominated by

any other feature in terms of both distance and score

• Nice properties– Small size in practice– Sufficient to support any

neighborhood condition and query parameter

9VLDB’2011 - Seattle, USA

x

caféhotel

p1

c5(0.8)c4(0.4)

c2(0.6)

c3(0.2)

y

c1(0.5)

?

Page 10: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Our framework

• Mapping to distance-score space– Pairs of objects (p, t) with t Fi to be examined

• Identify SKY(p, Fi)– Minimum set of pairs required to compute the

score of p according to Fi for any query

• Materialize SKY(p, Fi) – Stored in a R-tree, one R-tree Ri per feature set Fi

– Efficient query processing and maintenance• Query processing algorithm

10VLDB’ 2011 - Seattle, USA

Page 11: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Mapping to the distance-score space

• Mapping– Pairs (object, feature) – Space [distance X score]

11VLDB’ 2011 - Seattle, USA

p1

c3(0.5)

c1(0.9)

c4(0.3)

c2(0.7)

p2

(p1,c3)

(p1,c4)

(p1,c1)

(p1,c2)

(p2,c1)

(p2,c4)

(p2,c3) (p2,c2)

pair (p2,c)pair (p1,c)caféhotel

• Skyline– Minimize: distance– Maximize: score

Page 12: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Theoretical properties

• SKY(p, Fi) is sufficient to determine the partial score of p for any spatial preference query– Maintaining SKY(p, Fi) is sufficient to answer any

spatial preference query (stored in an R-tree)

• SKY(p, Fi) is the minimum set required– The data required to process range queries permits

processing nn and influence queries

• The proofs of the theorems can be found in the paper

12VLDB’ 2011 - Seattle, USA

Page 13: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Access to partial scores• Only node entries that

satisfy the spatial constraint are accessed– Items are retrieved in

decreasing order of score• Minor modifications to

support nn and influence

13VLDB’ 2011 - Seattle, USA

Max-heap: <e1(0.8) >root:

e1: e2:

e1 e2

(p3,t4) (p2,t1) (p1,t3) (p3,t4) (p2,t4) (p3,t4)

r=3

Max-heap: <p3(0.8),p2(0.6)>

Page 14: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Query processing

• Compute top-k data objects progressively aggregating partial scores retrieved from Ri

– Similar to Fagin’s algorithm (NRA)• Algorithm– Each time an object p is retrieved from Ri, any unseen

object p’ in Ri has a score(p’) ≤ score(p)– Keep track of lower and upper-bound score of the

seen objects– Terminates when the lower-bound of the k-th object is

better than the upper-bound of the remaining objects

14VLDB’ 2011 - Seattle, USA

Page 15: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no 15VLDB’ 2011 - Seattle, USA

1.7+ =

Example (range, r=4.5)

Object R1 R2 Score Upper-bound

r=4.5 r=4.5

R1

p3(0.8) p1(0.9)

R2

hotelX

restaurant

hotelX

restaurant

hotelX

bar

hotelX

bar

p1 - 0.9 0.9 1.7

p3 0.8 - 0.8 1.7

Page 16: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no 16VLDB’ 2011 - Seattle, USA

1.2+ =

Example (range, r=4.5)

Object R1 R2 Score Upper-bound

p3 0.8 - 0.8

p1 - 0.9 0.9

R1

p2(0.6) p2(0.6)

R2

r=4.5 r=4.5

1.4

1.5

p2 0.6 0.6 1.2 1.2

Page 17: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no 17VLDB’ 2011 - Seattle, USA

0.5+ =

Example (range, r=4.5)

Object R1 R2 Score Upper-bound

p3 0.8

p1 0.9

p2 0.6 0.6 1.2 1.2

R1

p1(0.2) p3(0.3)

R2

r=4.5 r=4.5

Top-1 0.2 1.1 1.1

0.3 1.1 1.1

Page 18: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Materialization

• Objects are partitioned into regions– The distance among objects in the same region is small– The skyline set of the objects in the same region is

similar with high probability

• Compute SKY(R, Fi) for the region R– SKY(p, Fi) SKY(R, Fi), ∀p R

• Advantage– The feature set is accessed only once to compute the

dynamic skyline of all objects in the region

18VLDB’ 2011 - Seattle, USA

Page 19: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Experimental evaluation

• We compare our approach (SFA) against SP, GP, BB, BB*, and FJ algorithms [1,2]

• All approaches are implemented in Java• Measures: response time, I/O, update time,

index construction time, and index size

19VLDB’ 2011 - Seattle, USA

[1] Yiu, M.L., Dai, X., Mamoulis, N., Vaitis, M., : “Top-k spatial preference queries”, ICDE, 2007. [2] Yiu, M.L., Lu, H., Mamoulis, N., Vaitis, M.: “Ranking spatial data by quality preferences”, TKDE, 2011.

Page 20: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Variables studied• Data distribution– Uniform (UN), Synthetic (CN), Real (RL)

• Cardinality (object and features)– 50K, 100K, 200K, 400K, 800K, 1600K

• Number of results (k)– 10, 20, 30, 40, 50

• Number of feature sets– 1, 2, 3, 4 5

• Query range (r), for range and influence queries– 10, 40, 160, 640, 2560

20VLDB’ 2011 - Seattle, USA

Page 21: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Datasets

21VLDB’ 2011 - Seattle, USA

DatasetsNumber of

data objectsNumber of

feature objectsDynamic

skyline set

Wal-Mart (WM) 11K 4K 1.98

Hotels (HT) 11K 31K 4.82

Synthetic (CN) 100K 100K 11.26

Uniform (UN) 100K 100K 12.04

Page 22: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Number of features

22VLDB’ 2011 - Seattle, USA

a) I/O varying the number of feature sets

b) response time varying thenumber of feature sets

Page 23: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Scalability

23VLDB’ 2011 - Seattle, USA

a) response time varying |Fi| b) response time varying |O|

Page 24: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Real datasets

24VLDB’ 2011 - Seattle, USA

b) influence c) nearest neighbora) range

Page 25: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Conclusion• Top-k spatial preference queries are a useful tool for

novel location-based applications• We propose a new approach for processing top-k

spatial preference queries efficiently– We find and materialize SKY(p, Fi) – We prove that SKY(p, Fi) is sufficient to determine the

partial score of p for any spatial preference query – The size of SKY(p, Fi) is small in practice

• We propose algorithms to process queries using our index

• The efficiency of our approach is verified through experiments on synthetic and real datasets

25VLDB’ 2011 - Seattle, USA

Page 26: Www.ntnu.no Efficient Processing of Top-k Spatial Preference Queries João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg 1 VLDB’

www.ntnu.no

Thanks!

More information:João B. Rocha-Junior

[email protected]://www.idi.ntnu.no/~joao

26VLDB’ 2011 - Seattle, USA