Upload
rachel-newman
View
215
Download
0
Embed Size (px)
Citation preview
www.ntnu.no
Efficient Processing of Top-k Spatial Preference Queries
João B. Rocha-Junior, Akrivi Vlachou, Christos Doulkeridis, and Kjetil Nørvåg
1VLDB’ 2011 - Seattle, USA
www.ntnu.no
Outline
• Top-k spatial preference queries• Current approaches• Our approach– Mapping to distance-score space– Query processing– Materialization (index construction)
• Experimental evaluation• Conclusion
2VLDB’ 2011 - Seattle, USA
www.ntnu.no
Motivation
• Increasing number of Web information systems specialized in location-based queries
• Systems are limited to simple spatial queries – Example: return objects in a given spatial location
• Top-k spatial preference query– Ranks data objects based on the score of feature
objects in their spatial neighborhood– Combines spatial and non-spatial scores
3VLDB’ 2011 - Seattle, USA
www.ntnu.no
Top-k spatial preference queries
4VLDB’ 2011 - Seattle, USA
x
bar caféhotel
p1
c4(0.8)
• Returns – Ranked set of k best data objects
• Score of a data object – Obtained from feature objects
in its spatial neighborhood
• Query– Spatial neighborhood– Features of interest (e.g., bars)
c2(0.4)
c1(0.6)
c3(0.2)
b1(0.9)
b3(0.3)
b2(0.6)
p2
p3
y
• Given a set of data objects and scored feature objects
Top-1
Top-1
Top-1
www.ntnu.no
Score function
• Aggregation of partial scores– Any monotone function: sum, max, and min
• Partial score– Score of a data object for a set of feature objects– Defined by the score of a single feature object• Highest score • Satisfies the spatial constraint
• Spatial constraint– Range, nearest neighbor, and influence
5VLDB’ 2011 - Seattle, USA
www.ntnu.no
Example (agg=sum)
6VLDB’ 2011 - Seattle, USA
Range Nearest neighbor Influence
score(p)=1.5 score(p)=1.0 score(p)=0.6
www.ntnu.no
Current approaches
• Naïve– Compute the score of all objects, select the top-k– Very costly
• State-of-the-art [1,2]– Data objects and feature objects are indexed by
multi-dimensional indices
7VLDB’ 2011 - Seattle, USA
[1] Yiu, M.L., Dai, X., Mamoulis, N., Vaitis, M., : “Top-k spatial preference queries”, ICDE, 2007. [2] Yiu, M.L., Lu, H., Mamoulis, N., Vaitis, M.: “Ranking spatial data by quality preferences”, TKDE, 2011.
www.ntnu.no
Current approaches• Probing algorithms (SP and GP) – Requires computing the score for all objects
• Branch and bound algorithms (BB and BB*)– Compute an upper-bound score for the entries in the
data objects R-tree – Prune entries whose upper-bound score is smaller
than the score of the k-th object found• Feature join algorithm (FJ)– Create combinations of feature sets with high score– Combinations whose score is smaller than the score
of the k-th object found are pruned
8VLDB’ 2011 - Seattle, USA
www.ntnu.no
Motivation behind our idea…
• Few feature objects are necessary to compute the score of a data object– Features not dominated by
any other feature in terms of both distance and score
• Nice properties– Small size in practice– Sufficient to support any
neighborhood condition and query parameter
9VLDB’2011 - Seattle, USA
x
caféhotel
p1
c5(0.8)c4(0.4)
c2(0.6)
c3(0.2)
y
c1(0.5)
?
www.ntnu.no
Our framework
• Mapping to distance-score space– Pairs of objects (p, t) with t Fi to be examined
• Identify SKY(p, Fi)– Minimum set of pairs required to compute the
score of p according to Fi for any query
• Materialize SKY(p, Fi) – Stored in a R-tree, one R-tree Ri per feature set Fi
– Efficient query processing and maintenance• Query processing algorithm
10VLDB’ 2011 - Seattle, USA
www.ntnu.no
Mapping to the distance-score space
• Mapping– Pairs (object, feature) – Space [distance X score]
11VLDB’ 2011 - Seattle, USA
p1
c3(0.5)
c1(0.9)
c4(0.3)
c2(0.7)
p2
(p1,c3)
(p1,c4)
(p1,c1)
(p1,c2)
(p2,c1)
(p2,c4)
(p2,c3) (p2,c2)
pair (p2,c)pair (p1,c)caféhotel
• Skyline– Minimize: distance– Maximize: score
www.ntnu.no
Theoretical properties
• SKY(p, Fi) is sufficient to determine the partial score of p for any spatial preference query– Maintaining SKY(p, Fi) is sufficient to answer any
spatial preference query (stored in an R-tree)
• SKY(p, Fi) is the minimum set required– The data required to process range queries permits
processing nn and influence queries
• The proofs of the theorems can be found in the paper
12VLDB’ 2011 - Seattle, USA
www.ntnu.no
Access to partial scores• Only node entries that
satisfy the spatial constraint are accessed– Items are retrieved in
decreasing order of score• Minor modifications to
support nn and influence
13VLDB’ 2011 - Seattle, USA
Max-heap: <e1(0.8) >root:
e1: e2:
e1 e2
(p3,t4) (p2,t1) (p1,t3) (p3,t4) (p2,t4) (p3,t4)
r=3
Max-heap: <p3(0.8),p2(0.6)>
www.ntnu.no
Query processing
• Compute top-k data objects progressively aggregating partial scores retrieved from Ri
– Similar to Fagin’s algorithm (NRA)• Algorithm– Each time an object p is retrieved from Ri, any unseen
object p’ in Ri has a score(p’) ≤ score(p)– Keep track of lower and upper-bound score of the
seen objects– Terminates when the lower-bound of the k-th object is
better than the upper-bound of the remaining objects
14VLDB’ 2011 - Seattle, USA
www.ntnu.no 15VLDB’ 2011 - Seattle, USA
1.7+ =
Example (range, r=4.5)
Object R1 R2 Score Upper-bound
r=4.5 r=4.5
R1
p3(0.8) p1(0.9)
R2
hotelX
restaurant
hotelX
restaurant
hotelX
bar
hotelX
bar
p1 - 0.9 0.9 1.7
p3 0.8 - 0.8 1.7
www.ntnu.no 16VLDB’ 2011 - Seattle, USA
1.2+ =
Example (range, r=4.5)
Object R1 R2 Score Upper-bound
p3 0.8 - 0.8
p1 - 0.9 0.9
R1
p2(0.6) p2(0.6)
R2
r=4.5 r=4.5
1.4
1.5
p2 0.6 0.6 1.2 1.2
www.ntnu.no 17VLDB’ 2011 - Seattle, USA
0.5+ =
Example (range, r=4.5)
Object R1 R2 Score Upper-bound
p3 0.8
p1 0.9
p2 0.6 0.6 1.2 1.2
R1
p1(0.2) p3(0.3)
R2
r=4.5 r=4.5
Top-1 0.2 1.1 1.1
0.3 1.1 1.1
www.ntnu.no
Materialization
• Objects are partitioned into regions– The distance among objects in the same region is small– The skyline set of the objects in the same region is
similar with high probability
• Compute SKY(R, Fi) for the region R– SKY(p, Fi) SKY(R, Fi), ∀p R
• Advantage– The feature set is accessed only once to compute the
dynamic skyline of all objects in the region
18VLDB’ 2011 - Seattle, USA
www.ntnu.no
Experimental evaluation
• We compare our approach (SFA) against SP, GP, BB, BB*, and FJ algorithms [1,2]
• All approaches are implemented in Java• Measures: response time, I/O, update time,
index construction time, and index size
19VLDB’ 2011 - Seattle, USA
[1] Yiu, M.L., Dai, X., Mamoulis, N., Vaitis, M., : “Top-k spatial preference queries”, ICDE, 2007. [2] Yiu, M.L., Lu, H., Mamoulis, N., Vaitis, M.: “Ranking spatial data by quality preferences”, TKDE, 2011.
www.ntnu.no
Variables studied• Data distribution– Uniform (UN), Synthetic (CN), Real (RL)
• Cardinality (object and features)– 50K, 100K, 200K, 400K, 800K, 1600K
• Number of results (k)– 10, 20, 30, 40, 50
• Number of feature sets– 1, 2, 3, 4 5
• Query range (r), for range and influence queries– 10, 40, 160, 640, 2560
20VLDB’ 2011 - Seattle, USA
www.ntnu.no
Datasets
21VLDB’ 2011 - Seattle, USA
DatasetsNumber of
data objectsNumber of
feature objectsDynamic
skyline set
Wal-Mart (WM) 11K 4K 1.98
Hotels (HT) 11K 31K 4.82
Synthetic (CN) 100K 100K 11.26
Uniform (UN) 100K 100K 12.04
www.ntnu.no
Number of features
22VLDB’ 2011 - Seattle, USA
a) I/O varying the number of feature sets
b) response time varying thenumber of feature sets
www.ntnu.no
Scalability
23VLDB’ 2011 - Seattle, USA
a) response time varying |Fi| b) response time varying |O|
www.ntnu.no
Real datasets
24VLDB’ 2011 - Seattle, USA
b) influence c) nearest neighbora) range
www.ntnu.no
Conclusion• Top-k spatial preference queries are a useful tool for
novel location-based applications• We propose a new approach for processing top-k
spatial preference queries efficiently– We find and materialize SKY(p, Fi) – We prove that SKY(p, Fi) is sufficient to determine the
partial score of p for any spatial preference query – The size of SKY(p, Fi) is small in practice
• We propose algorithms to process queries using our index
• The efficiency of our approach is verified through experiments on synthetic and real datasets
25VLDB’ 2011 - Seattle, USA
www.ntnu.no
Thanks!
More information:João B. Rocha-Junior
[email protected]://www.idi.ntnu.no/~joao
26VLDB’ 2011 - Seattle, USA