Download pdf - Nearest neighbor - cseweb.ucsd.edu

Nearest neighbor

CSE 291

A: Finding similar items from the past

Application: Novelty detection

• Past observations: x1, x2, . . . , xn from some space X• Now you see x

• Is it something familiar? Or something new that warrants attention?

Nearest neighbor approach:

• Fix a distance function d on X• Find mini d(xi , x)

• If this distance is large: x is something new

Application: Implicit classification problems

• Database of past medical patients: x1, . . . , xn

• A new patient x arrives

• Will this patient get disease y?

Nearest neighbor approach:

• Fix a distance function d on the space of patient records

• Find the k nearest patients xi using this distance function

• Find out how many of these patients got disease y

Can do this with any number of different y ’s!

B: Picking the right distance function

Example: MNIST digits

• Each image is 28× 28 and grayscale

• Can represent as a vector in R784

MNIST and Euclidean distance

Euclidean distance between two images, represented by x , x ′ ∈ R784:

‖x − x ′‖ =

√√√√ 784∑i=1

(xi − x ′i )2.

An experiment:

• Remember a set of 60,000 images.

• Then 10,000 more appear.

• What fraction of the time are their nearest neighbors the same digit? 96.91%

Failures

Query

NN

Euclidean distance doesn’t respect basic invariances for images, e.g.:

Another distance function for images: Shape contextFrom D’Arcy Thompson’s On Growth and Form:

• Identify a set of “keypoints” in each image

• Summarize each by its position relative to other keypoints in the image

• Match up keypoints between the two images

• Find a transformation between the images

• Distance is the cost of this transformation

MNIST “accuracy”: 99.37%

C: Picking the right representation

From Herbert Simon, Sciences of the Artificial:

Solving a problem simply means representing it so as to make thesolution transparent.

Representations for text

• Bag of words

• Latent semantic indexing

• Brown clustering

• Topic models

• Word2Vec and Glove

• BERT and beyond

Representations for images

• Principal component analysis, for images or image-patches

• Wavelets

• Sparse coding

• SIFT

• HOG

• Deep belief nets

• Self- or fully-supervised deep representations

Example: Histogram of oriented gradients (HoG)

1 Normalize gamma and color

2 Compute gradients

3 Spatial/orientation binning

4 Contrast normalization

5 Concatenation

Other domains

• Audio: speech, music, etc.

• Genetic sequences

• What else?