PODS, May 23, 2012

Preview:

DESCRIPTION

Joint work with Pankaj K. Agarwal , Alon Efrat , and Swaminathan Sankararaman. Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Department of Computer Science, Duke University. PODS, May 23, 2012. Nearest-Neighbor Searching. a set of points in. any query point in. - PowerPoint PPT Presentation

Citation preview

Nearest-Neighbor Searching Under UncertaintyWuzhou Zhang

Department of Computer Science, Duke University

PODS, May 23, 2012

Joint work with Pankaj K. Agarwal, Alon Efrat, and Swaminathan Sankararaman

2

Nearest-Neighbor Searching

ApplicationsDatabases, Information RetrievalStatistical Classification, ClusteringPattern Recognition, Data CompressionComputer Vision, etc.

𝑆

π‘βˆ—

a set of points in

any query point in

Find the closest point to

π‘ž

3

Voronoi Diagram

Voronoi cell: Voronoi diagram : decomposition induced by

Preprocessing time

Space

Query time

𝑝𝑖

4

Data Uncertainty Location of data is imprecise: Sensor databases, face recognition, mobile data, etc.

π‘ž

What is the β€œnearest neighbor” of now?

5

Our Model and Problem Statement Uncertain point : represented as a probability density function(pdf) --

Expected distance:

. Find the expected nearest neighbor (ENN) of :

Or an -ENN : π‘ž 𝑄

6

Previous work Uncertain data

ENNβ€’ The ENN under metric: Ξ΅-approximation [Ljosa2007]β€’ No bounds on the running time

Most likely NNβ€’ Heuristics [Cheng2008, Kriegel2007, Cheng2004, etc]

Uncertain queryENNβ€’ Discrete uniform distribution: both exact and O(1)

factor approximation [Li2011, Sharifzadeh2010, etc] β€’ No bounds on the running time

7

Our contribution

Distance

function

Settings Preprocessing time Space Query time

Squared Euclidean distance

Uncertain data

Uncertain query

metric

Uncertain data

Uncertain query

Euclidean metric(-ENN)

Uncertain data

Uncertain query

Results in , extends to higher dimensions

First nontrivial methods for ENN queries with provable performance guarantees !

8

Expected Voronoi cell

Expected Voronoi diagram : induced by

An example in metric

Expected Voronoi Diagram

9

: the centroid of

Lemma:

β€’ same as the weighted Voronoi diagram WVD

Squared Euclidean distanceUncertain data

Preprocessing time

Space Query time

Remarks: Works for any distribution

οΏ½Μ‚οΏ½πœŽ 2

π‘ƒβˆˆπ’« π‘ž

Ed (𝑃 ,π‘ž)|βˆ¨π‘žβˆ’οΏ½Μ‚οΏ½ ||2

οΏ½Μ‚οΏ½πœŽ 2

10

metricUncertain data Size of : Lower bound construction

the inverse Ackermann function Remarks: Extends to metric

11

metricUncertain data (cont.) A near-linear size index exists despite size of

Preprocessing time

Space Query time

Remarks: Extends to higher dimensions

12

Euclidean metric (-ENN)Uncertain data Approximate by

Outside the grid:

Inside the gird:

Total # of cells:

Remarks: Extends to any metric

8 Ed (𝑃 , οΏ½Μ‚οΏ½)/ πœ€οΏ½Μ‚οΏ½

Cell size: πœ€

13

Euclidean metric (-ENN)Uncertain data (cont.)

A linear size approximate !

13

Preprocessing time

Space Query time

𝑔𝑃 1

𝑔𝑃 2

π‘ž

14

Conclusion and future work Conclusion

First nontrivial methods for answering exact or approximate ENN queries with provable performance guarantees

ENN is not a good indicator when the variance is large Future work

Linear-size index for most likely NN queries in sublinear time Index for returning the probability distribution of NNs

THANKS

15

Squared Euclidean distanceUncertain query

: the centroid of

Preprocessingβ€’ Compute the Voronoi diagram VD Queryβ€’ Given , compute in , then query VD with

Preprocessing time

Space Query time

Remarks: Extends to higher dimensions and works for any distribution

16

Rectilinear metricUncertain query Similarly, linear pieces

Preprocessing time

Space

Query time

17

Euclidean metric (-ENN)Uncertain query

Preprocessing time

Space

Query time

Remarks: Extends to higher dimensions

18

metricUncertain data (cont.) A near-linear size index exists despite size of

linear pieces!

𝑝𝑖𝑗

βˆ’ (π‘₯𝑝 π‘–π‘—βˆ’π‘₯π‘ž)+(𝑦𝑝 𝑖𝑗

βˆ’ π‘¦π‘ž)

βˆ’ (π‘₯𝑝 π‘–π‘—βˆ’π‘₯π‘ž)βˆ’ ( 𝑦𝑝𝑖𝑗

βˆ’ π‘¦π‘ž)

(π‘₯π‘π‘–π‘—βˆ’π‘₯π‘ž)+ (𝑦 𝑝𝑖𝑗

βˆ’ π‘¦π‘ž)

(π‘₯π‘π‘–π‘—βˆ’π‘₯π‘ž)βˆ’ ( 𝑦𝑝 𝑖𝑗

βˆ’π‘¦π‘ž )𝑝𝑖𝑗

Linear!

𝑃 𝑖

Recommended