What is a Good Nearest Neighbors Algorithm for Finding Similar Patches in Images?

What is a Good Nearest Neighbors Algorithm for Finding Similar Patches in Images?Neeraj Kumar*, Li Zhang†, Shree K. Nayar*

*Columbia University, †University of Wisconsin-Madisonhttp://www.cs.columbia.edu/CAVE/projects/nnsearch/

Motivation

Brute-force search for all patches would take >250 hrs/image!

Brute-Force Search: 1375 msvp-Tree Search: 3.43 ms

Brute-Force Search: 2125 msvp-Tree Search: 1.25 ms

Query PatchResults

Related Work

Fast search would greatly speed-up many vision algorithms:

• Object Recognition – e.g., using SIFT [Lowe 2003]

• Image Denoising using non-local means [Buades et al. 2005]

• Shape Matching using self-similarity [Schechtman & Irani 2007]

• Texture Synthesis using image quilting [Efros & Freeman 2002]

• Surveys: [Chavez et al. 2001, Shakhnarovich et al. 2006]

• Approximate NN: [Arya et al. 1998, Indyk and Motwani 1998]

• Nearest Neighbors using L∞: [Nene and Nayar 1997]

• Performance Comparisons: [Mikolajczyk and Matas 2007]

The Nearest Neighbors ProblemGiven a set of points P and query point q, the NN problem is:ε-NN Search: Find set of points PC within distance ε:

k-NN Search: Find set of k points PC closest to q:

qpdPp iCi ,

qpdqpdPpPp jiCjCi ,,,and

ε-NN: p4, p6, p8

2-NN: p6, p8

4-NN: p6 , p8, p4, p3

Example results for query point q:

Nearest Neighbor Approaches

Evaluation Dataset

kd-Treedim = argmaxdim[var(ptsdim)]split = median(ptsdim)L_pts = [ptsdim < split]R_pts = [ptsdim > split]

PCA Treeaxis = pca(pts).eigvec[0]curpts = project(pts, axis)split = median(curpts)L_pts = [curpts < split]R_pts = [curpts > split]

Vantage Point (vp) Treept = chooseRefPt(pts)split = median(d(pts, pt))L_pts = [d(pts,pt) < split]R_pts = [d(pts,pt) > split]

Performance Evaluation

ConclusionsMethod Cons. Perf. ε-NN Speed k-NN Speedkd-Tree Excellent Poor Poor

PCA Tree Poor Fair FairBall Tree Fair Excellent Excellentk-Means Poor Good Goodvp-Tree Excellent Excellent Excellent

Construction and Search Performance

Search Speed vs. Distance/Patch Size

Search Speed vs. Input Set Size

Implementation TricksWe use various optimizations on all trees:

• Compute Lp norms using lookup tables

• Pre-calculate distances within each leaf node

• Use priority queues for k-NN searches

All methods organize points into a tree structure, with the only difference being the function used to split a set of points into different child nodes (shown below).

Video FramesRandom Images

No similarity within images High similarity within frames

Exponential dropoff in speed

Exponential dropoff in speedClosest k points

increasingly distant

Fewer points

within sa

me

“average per-p

ixel” d

istance

• Cons. cost is number of distance function evaluations• Search speed is improvement over brute-force search

Ball Treept1, pt2 = chooseRefPts(pts)d1 = d(pts, pt1)d2 = d(pts, pt2)L_pts = [d1 < d2]R_pts = [d2 < d1]

Documents

What is a Good Nearest Neighbors Algorithm for Finding Similar Patches in Images?