Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Object Recognition using Local Descriptors

Javier Ruiz-del-Solar, and Patricio LoncomillaCenter for Web Research

Universidad de Chile

Outline

•Motivation & Recognition Examples

•Dimensionality problems

•Object Recognition using Local Descriptors

•Matching & Storage of Local Descriptors

•Conclusions

Motivation

• Object recognition approaches based on local invariant descriptors (features) have become increasingly popular and have experienced an impressive development in the last years.

• Invariance against: scale, in-plane rotation, partial occlusion, partial distortion, partial change of point of view.

• The recognition process consists on two stages:1. scale-invariant local descriptors (features) of the

observed scene are computed. 2. these descriptors are matched against descriptors of

object prototypes already stored in a model database. These prototypes correspond to images of objects under different view angles.

Recognition Examples (1/2)

Recognition Examples (2/2)

Image Matching Examples (1/2)

Image Matching Examples (2/2)

Some applications

• Object retrieval in multimedia databases (e.g. Web)• Image retrieval by similarity in multimedia databases• Robot self-localization• Binocular vision• Image alignment and matching• Movement compensation• …

However … there are some problems

Dimensionality problems• A given image can produce ~100-1,000 descriptors of

128 components (real values)• The model database can contain until 1,000-10,000

objects in some special applications• => large number of comparisons => large processing

time• => large database’s size

Main motivation of this talk:• To get some ideas about how to make efficient

comparisons between local descriptors as well as efficient storage of them …

Recognition Process

The recognition process consists on two stages:1. scale-invariant local descriptors (features) of the

observed scene are computed. 2. these descriptors are matched against descriptors of

object prototypes already stored in a model database. These prototypes correspond to images of objects under different view angles.

Interest Points

Detection

Scale Invariant Descriptors (SIFT)

Calculation

Affine Transform Calculation

SIFT Matching

SIFT Database

Interest Points

Detection


Calculation

Reference Image

Offline Database Creation

Input Image

Affine Transform Parameters

Interest Points

Detection


Calculation


SIFT Matching

SIFT Database

Interest Points

Detection


Calculation

Input Image


Reference Image


Interest Points Detection (1/2)

Interests points correspond to maxima of the SDoG (Subsampled Difference of Gaussians) Scale-Space (x,y,).

Scale Space

SDoG

Ref: Lowe 1999

Interest Points Detection (2/2)

Examples of detected interest points.

Our improvement: Subpixel location of interest points by a 3D quadratic approximation around the detected interest point in the scale-space.

Interest Points

Detection


Calculation


SIFT Matching

SIFT Database

Interest Points

Detection


Calculation

Input Image


Reference Image


SIFT Calculation For each obtained keypoint, a descriptor or feature vector

that considers the gradient values around the keypoint is computed. This descriptors are called SIFT (Scale -Invariant Feature Transformation).

SIFTs allow obtaining invariance against to scale and orientation.

Ref: Lowe 2004

Interest Points

Detection


Calculation


SIFT Matching

SIFT Database

Interest Points

Detection


Calculation

Input Image


Reference Image


SIFT Matching

Euclidian distance between the SIFTs (vectors) is employed.

Interest Points

Detection


Calculation


SIFT Matching

SIFT Database

Interest Points

Detection


Calculation

Input Image


Reference Image


Affine Transform Calculation (1/2)

Several stages are employed:1. Object Pose Prediction• In the pose space a Hough transform is employed for

obtaining a coarse prediction of the object pose, by using each matched keypoint for voting for all object pose that are consistent with the keypoint.

• A candidate object pose is obtained if at least 3 entries are found in a Hough bin.

2. Affine Transformation Calculation• A least-squares procedure is employed for finding an affine

transformation that correctly account for each obtained pose.

u

v

m1 m2

m3 m4

x

y

txty

Affine Transform Calculation (2/2)

3. Affine Transformation Verification Stages:• Verification using a probabilistic model (Bayes classifier).• Verification based on Geometrical Distortion• Verification based on Spatial Correlation• Verification based on Graphical Correlation• Verification based on the Object Rotation4. Transformations Merging based on Geometrical

Overlapping

In blue verification stages proposed by us for improving the detection of robots heads.

Input Image

Reference Images

AIBO Head Pose Detection Example

Matching & Storage of Local Descriptors

• Each reference image gives a set of keypoints.• Each keypoint have a graphical descriptor, which is a 128-

components vector.• All the (keypoint,vector) pairs corresponding to a set of

reference images are stored in a set T.

Reference image

x,y,n,v1

v2

...v128

x,y,n,v1

v2

...v128

x,y,n,v1

v2

...v128

x,y,n,v1

v2

...v128 ...

(1) (2) (3) (4)

= T

Reference image

p1

d1

p2

d2

p3

d3

p4

d4

...

= T


More compact notation

• Each reference image gives a set of keypoints.• Each keypoint have a graphical descriptor, which is a 128-

components vector.• All the (keypoint,vector) pairs corresponding to a set of

reference images are stored in a set T.

• In the matching-generation stage, an input image gives another set of keypoints and vectors.

• For each input descriptor, the first and second nearest descriptors in T must be found.

• Then, a pair of nearest descriptors (d,dFIRST) gives a pair of matched keypoints (p,pFIRST).


Input image

p

d

...

Search in T

pFIRST

dFIRST

pSEC

dSEC

p1

d1

p2

d2 ...

• The match is accepted if the ratio between the distance to the first nearest descriptor and the distance to the second nearest descriptor is lower than a given threshold

• This indicates that exists no possible confusion in the search results.


Accepted if:

distance( , ) < * distance ( , )d dFIRST d dSEC

• A way to store the T set in a ordered way is using a kd-tree

• In this case, we will use a 128d-tree

• As well known, in a kd-tree the elements are stored in the leaves. The other nodes are divisions of the space in some dimension.

Storage: Kd-trees

1>2

2>3 2>5

13

27

65

89

All the vectors with more than 2 in the first dimension, stored at right side

Division node

Storage node

• Generation of balanced kd-trees:

• We have a set of vectors

• We calculate the means and variances for each dimension i.

Storage: Kd-trees

a1

a2

…

b1

b2

…

c1

c2

…

d1

d2

…

………

M i 1

Nai bi c i ... ,Vi

1

Nai M i 2 (bi M i)

2 ...

Tree construction:• Select the dimension iMAX with the largest variance

• Order the vectors with respect to the iMAX dimension.

• Select the median M in this dimension.

• Get a division node.

• Repeat the process in a recursive way.

Storage: Kd-trees

iMAX>M

Nodes with iMAX component lesser than M

Nodes with iMAX component greater than M

Search process of the nearest neighbors, two alternatives:

• Compare almost all the descriptors in T with the given descriptor and return the nearest one, or

• Compare Q nodes at most, and return the nearest of them (compare calculate Euclidean distance)

• Requires a good search strategy

• It can fail

• The failure probability is controllable by Q

We choose the second option and we use the BBF (Best Bin First) algorithm.

Search Process

• Set:• v: query vector

• Q: priority queue ordered by distance to v (initially void)

• r: initially is the root of T

• vFIRST: initially not defined and with an infinite distance to v

• ncomp: number of comparisons, initially zero.

• While (!finish):• Make a search for v in T from r => arrive to a leaf c

• Add all the directions not taken during the search to Q in an ordered way (each division node in the path gives one not-taken direction)

• If c is more near to v than vFIRST, then vFIRST=c

• Make r = the first node in Q (the more near to v), ncomp++

• If distance(r,v) > distance(vFIRST,v), finish=1

• If ncomp > ncompMAX, finish=1

Search Process: BBF Algorithm

1>2

2>3

13

27

207

2>7

1>6

51500

91000

208?:

queue:

CMIN:

Distance between 2 and 20

Search ExampleRequested vector

1>2

18

•I am a pointer•20>2

•Go right

Not-taken option

18

Search Example

1>2

2>3

13

27

207

2>7

1>6

51500

91000

208?:

CMIN:

queue: 1>22>7

181

•8>7•Go right

18

1

comparisons: 0

1>2

2>3

13

27

207

2>7

1>6

51500

91000

208?:

queue: 1>22>7

18

1>8

CMIN:

Search Example

1 14

•20>6•Go right

18

1

14

comparisons: 0

1>2

2>3

13

27

207

2>7

1>6

51500

91000

208?:

queue: 1>22>7

181

1>8

14

CMIN: 91000

992

•We arrived to a leaf

•Store nearest leaf in CMIN

Search Example

992

14

18

1

comparisons: 1

1>2

2>3

13

27

207

2>7

1>6

51500

91000

208?:

queue: 1>22>7

181

1>8

14

CMIN: 91000

992

•Distance from best-in-queue is

lesser than distance from

cMIN

•Start new search from

best in queue•Delete best

node in queue

Search Example

992

14

18

1

1>2

2>3

13

27

207

2>7

1>6

51500

91000

208?:

queue: 1>2

18

1>8

12

CMIN: 91000

992

•Go down from here

Search Example

992

14

18

1

comparisons: 1

1>2

2>3

13

27

207

2>7

1>6

51500

91000

208?:

queue: 1>2

18

1>8

12

CMIN:

•We arrived to a leaf

•Store nearest leaf in CMIN

992

14

18

1

1

207

1

Search Example

comparisons: 2

1>2

2>3

13

27

207

2>7

1>6

51500

91000

208?:

queue: 1>2

18

1>8

12

CMIN:

992

14

18

1

1

207

1

Search Example

•Distance from best-in-queue is

NOT lesser than distance

from cMIN

•Finish

comparisons: 2

Conclusions• BBF+Kd-trees: good trade off between short search time

and high success probability.

• But, perhaps BBF+ Kd-trees is not the optimal solution.

• Finding a better methodology is very important to massive applications (as an example, for Web image retrieval)

Documents

Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile