Upload
dixon
View
82
Download
2
Embed Size (px)
DESCRIPTION
SLICE: Reviving Regions-Based Pruning for Reverse k Nearest Neighbors Queries. Shiyu Yang 1 , Muhammad Aamir Cheema 2,1 , Xuemin Lin 1,3 , Ying Zhang 4,1. 1 The University of New South Wales, Australia 2 Monash University, Australia 3 East China Normal University, China - PowerPoint PPT Presentation
Citation preview
Never Stand Still Faculty of Engineering Computer Science and Engineering
Click to edit Present’s Name
Never Stand Still Faculty of Engineering Computer Science and Engineering
SLICE: Reviving Regions-Based Pruning for Reverse k Nearest Neighbors Queries
Shiyu Yang1, Muhammad Aamir Cheema2,1, Xuemin Lin1,3, Ying Zhang4,1
1The University of New South Wales, Australia2 Monash University, Australia
3 East China Normal University, China4University of Technology, Sydney, Australia
School of Computer Science and Engineering2
Introduction• k Nearest Neighbor Query
– Find the facility that is one of k-closest facilities to the query user.
• Reverse k Nearest Neighbor Query– Find every user for which the query
facility is one of the k-closest facilities.
• RkNNs are the potential customers of a facility
u1
f1
u2
u3f3
f2
K=1
School of Computer Science and Engineering3
Related Work
Pruning Verification
Half-space
Region-based
TPL (VLDB 2004),FINCH (VLDB 2008),InfZone (ICDE 2011)
Six-regions (SIGMOD 2000)
Six-regions (SIGMOD 2000)
TPL (VLDB 2004)
FINCH (VLDB 2008)
Boost (SIGMOD 2010)
InfZone (ICDE2011)
School of Computer Science and Engineering4
Related Work• Regions-based Pruning: -Six-regions(SIGMOD 2000)
1. Divide the whole space centred at the query q into six equal regions
2. Find the k-th nearest neighbor in each Partition.
3. The k-th nearest facility of q in each region defines the area that can be pruned
k=2
The user points that cannot be pruned should be verified by range query
b ac
d
q
u1
u2
School of Computer Science and Engineering5
Related Work• Half-space Pruning: the space that is contained by k half- spaces can be pruned
-TPL (VLDB 2004)
1. Find the nearest facility f in the unpruned area.
2. Draw a bisector between q and f, prune by using the half-space
3. Iteratively access the nearest facility in unpruned area.
k=2
b ac
d
q
School of Computer Science and Engineering6
Related Work• Half-space Pruning: -InfZone(ICDE 2011)
1. The influence zone corresponds to the unpruned area when the bisectors of all the facilities have been considered for pruning.
2. A point p is a RkNN of q if and only if p lies inside unpruned area.
3. No verification phase.
Half-space pruning is expensive especially when k is large.
k=2
b ac
d
q
School of Computer Science and Engineering7
Related WorkRegions-
basedHalf-
spaceVS
Range query
Pruning CostO(m log k) O(km2
)
Pruning Power
Verification Cost
Low High
Can regions-based pruning do better?
O(log m)
SLICE
O(m log m)
High
O(k)
m is the # of facilities considered for pruning
School of Computer Science and Engineering8
Notations• Partition: P
• Subtended angle: ∠a
• Maximum (minimal) subtended angle w.r.t P (, )
• Upper (lower) arc– Center: q– Radius: =
q
f p
a
θmi
n θmax
PUppe
r
Lower 𝒅𝒊𝒔𝒕( 𝒇 ,𝒒)
𝟐𝒄𝒐𝒔 (θ𝒎𝒂𝒙 )
𝒅𝒊𝒔𝒕 ( 𝒇 ,𝒒 )𝟐𝒄𝒐𝒔 (θ𝒎𝒊𝒏)
School of Computer Science and Engineering9
Observation -- Pruning• A facility f prunes every point
p ∈ P for whichdist(p,q) > (UpperArc)< 90◦• We can prove a < b.
– a2=b2+c2-2bc∙cos()– b> = – c2-2bc∙cos() < c2-2 c∙cos() = c2(1- ) <0
• Facility prunes area outside the upper arc of f for every partition P for which < 90◦
q
f p
θ
PUppe
ra
cb 𝒅𝒊𝒔𝒕( 𝒇 ,𝒒)
𝟐𝒄𝒐𝒔 (θ𝒎𝒂𝒙 )θmax
School of Computer Science and Engineering10
Comparison with Six-regions
q
fSix-region SLICE
Partitions Pruned
No. of Partitions
One
6
Area pruneddist(f,q) 𝑑𝑖𝑠𝑡 ( 𝑓 ,𝑞)2cos(𝜃)
< 90o
any
VS
School of Computer Science and Engineering11
Pruning Algorithm• Divide space into t partitions
• Compute the upper arc of each partition for facilities.
• The area outside the k-th smallest upper arc (rB) in each partition can be pruned.
• Users in the pruned area can be pruned
• Users in the unpruned area will be verified by accessing significant facilities
q
f1f2
u1
u2
k=2
School of Computer Science and Engineering12
Significant Facility Verification• Significant facility:
– A facility f that prunes at least one point p ∈ P lying inside the bounding arc of P.
MN
𝐫 𝐁
P
𝐫 𝐁 𝐫 𝐁
Significant facility cannot be in red area
• Verification for a candidate
Issuing range query
for each candidate
Accessing significant
facilities (O(k))
High I/O cost No additional I/O cost
Regions-based
2
SLICE
q
School of Computer Science and Engineering13
Theoretical Analyses• Number of significant facilities
• More analyses can be found in paper
• I/O Cost• Pruning phase:
– Same as circular range query centered at q with radius 2rB
• Verification phase:– Same as circular range query
centered at q with radius rB
2.34k ( θ ⇒ 0)
9k ( θ = 60o)
School of Computer Science and Engineering14
Experiments• Data Set :• Synthetic data :
– Size:50000, 100000, 150000 or 200000
– Distribution: Uniform or Normal
• Real data: The real data set consists of 175, 812 points in North America
• Algorithms: – Six-regions, InfZone and
SLICE
– Page size 4KB and number of buffers for Six-regions is 10
– Number of partitions for SLICE is 12
School of Computer Science and Engineering15
Experiments• Effect of different values of k
I/O CPU
School of Computer Science and Engineering16
Experiments• Effect of data distribution • Effect of % users
School of Computer Science and Engineering17
Experiments• Effect of partitions • Number of significant facilities
Number of partitions
Value of k
Thanks!Q&A