17
An Intelligent & Incremental Approach to kNN using R-trees DJ Oneil & Esten Rye (G01)

An Intelligent & Incremental Approach to kNN using R-trees DJ Oneil & Esten Rye (G01)

  • View
    217

  • Download
    2

Embed Size (px)

Citation preview

An Intelligent & Incremental Approach to kNN using R-trees

DJ Oneil & Esten Rye (G01)

Presentation Outline

Motivation Related Work Problem Definition Approach Validation Conclusion

Motivation

kNN is a popular (GIS, AI, Pattern Recognition, Clustering, Outlier Detection)

kNN is a hard problem R-tree is the industry standard (Oracle,

Microsoft SQL Server, DB2, and MySQL) Problems with higher dimensional spaces GIS

Related Work

Voronoi Diagram Incremental approach (find k+1 using k) High dimensions (X-tree) New data structures (k-d tree, P-range

tree, X-tree, SS-tree, …)

What’s Missing???

Domain specific classifications Informed, incremental approach to R-tree

kNN

Problem Definition

Given: Spatial database with n objects and query point, q.

Find: The k ≤ n ranked nearest neighbors. Objective:

Use object classifications Incremental

Constraints: Spatial objects are stored in an R-Tree

Key Ideas

Allow users to define domain-specific classifiers to decrease search space

Use informed, incrementally increasing query region to decrease search space

Don’t worry about finding exactly k nearest neighbors.

Approach

Object Classification Distance Classification Incrementally increasing concentric circle

query regions

Detour: R-tree

Object Classification

Domain specific classifiers. Only search MBBs that contain

classifications Adds classification dimensions. Example: Zoning Classifier

{“Residential”, “Industrial”, Commercial”}

Distance Classification

Maps Euclidean distance/increment generator to region

Default function Separate R-tree

Concentric Circles

Decrease candidate regions Only consider MBBs that are completely

contained in query region Ignore previously searched MBBs

Algorithm Example: 3 nearest squares

1) Get distance function

2) Search…

Validation

Find nearest gas stations (Zoning example) 1.7% total searchable area of Minneapolis

Complexity: p classifiers with q classifications

Computational: O(p*logα(q))* O(logα(n)) ≈ O(logα(n))

Spatial: (p*q*s + t)(n + α*logαn + α)

Conclusion

Expand R-trees for kNN User-defined, domain specific classifiers to

decrease search space User defined incremental distance function Increasing Euclidean distance, Concentric

Circles

Future Work

Extend distance classifier to include many classifiers Non-Euclidean distance (e.g. speed limit) Combine distance classification tree with data tree Experiment Plan for incrementally upgrading existing R-tree

implementations Determine threshold for number of classifiers and

classifications

Any Questions???