40
Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Embed Size (px)

Citation preview

Page 1: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Object Recognition using Local Descriptors

Javier Ruiz-del-Solar, and Patricio LoncomillaCenter for Web Research

Universidad de Chile

Page 2: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Outline

•Motivation & Recognition Examples

•Dimensionality problems

•Object Recognition using Local Descriptors

•Matching & Storage of Local Descriptors

•Conclusions

Page 3: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Motivation

• Object recognition approaches based on local invariant descriptors (features) have become increasingly popular and have experienced an impressive development in the last years.

• Invariance against: scale, in-plane rotation, partial occlusion, partial distortion, partial change of point of view.

• The recognition process consists on two stages:1. scale-invariant local descriptors (features) of the

observed scene are computed. 2. these descriptors are matched against descriptors of

object prototypes already stored in a model database. These prototypes correspond to images of objects under different view angles.

Page 4: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Recognition Examples (1/2)

Page 5: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Recognition Examples (2/2)

Page 6: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Image Matching Examples (1/2)

Page 7: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Image Matching Examples (2/2)

Page 8: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Some applications

• Object retrieval in multimedia databases (e.g. Web)• Image retrieval by similarity in multimedia databases• Robot self-localization• Binocular vision• Image alignment and matching• Movement compensation• …

Page 9: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

However … there are some problems

Dimensionality problems• A given image can produce ~100-1,000 descriptors of

128 components (real values)• The model database can contain until 1,000-10,000

objects in some special applications• => large number of comparisons => large processing

time• => large database’s size

Main motivation of this talk:• To get some ideas about how to make efficient

comparisons between local descriptors as well as efficient storage of them …

Page 10: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Recognition Process

The recognition process consists on two stages:1. scale-invariant local descriptors (features) of the

observed scene are computed. 2. these descriptors are matched against descriptors of

object prototypes already stored in a model database. These prototypes correspond to images of objects under different view angles.

Page 11: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Interest Points

Detection

Scale Invariant Descriptors (SIFT)

Calculation

Affine Transform Calculation

SIFT Matching

SIFT Database

Interest Points

Detection

Scale Invariant Descriptors (SIFT)

Calculation

Reference Image

Offline Database Creation

Input Image

Affine Transform Parameters

Page 12: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Interest Points

Detection

Scale Invariant Descriptors (SIFT)

Calculation

Affine Transform Calculation

SIFT Matching

SIFT Database

Interest Points

Detection

Scale Invariant Descriptors (SIFT)

Calculation

Input Image

Affine Transform Parameters

Reference Image

Offline Database Creation

Page 13: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Interest Points Detection (1/2)

Interests points correspond to maxima of the SDoG (Subsampled Difference of Gaussians) Scale-Space (x,y,).

Scale Space

SDoG

Ref: Lowe 1999

Page 14: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Interest Points Detection (2/2)

Examples of detected interest points.

Our improvement: Subpixel location of interest points by a 3D quadratic approximation around the detected interest point in the scale-space.

Page 15: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Interest Points

Detection

Scale Invariant Descriptors (SIFT)

Calculation

Affine Transform Calculation

SIFT Matching

SIFT Database

Interest Points

Detection

Scale Invariant Descriptors (SIFT)

Calculation

Input Image

Affine Transform Parameters

Reference Image

Offline Database Creation

Page 16: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

SIFT Calculation For each obtained keypoint, a descriptor or feature vector

that considers the gradient values around the keypoint is computed. This descriptors are called SIFT (Scale -Invariant Feature Transformation).

SIFTs allow obtaining invariance against to scale and orientation.

Ref: Lowe 2004

Page 17: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Interest Points

Detection

Scale Invariant Descriptors (SIFT)

Calculation

Affine Transform Calculation

SIFT Matching

SIFT Database

Interest Points

Detection

Scale Invariant Descriptors (SIFT)

Calculation

Input Image

Affine Transform Parameters

Reference Image

Offline Database Creation

Page 18: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

SIFT Matching

Euclidian distance between the SIFTs (vectors) is employed.

Page 19: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Interest Points

Detection

Scale Invariant Descriptors (SIFT)

Calculation

Affine Transform Calculation

SIFT Matching

SIFT Database

Interest Points

Detection

Scale Invariant Descriptors (SIFT)

Calculation

Input Image

Affine Transform Parameters

Reference Image

Offline Database Creation

Page 20: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Affine Transform Calculation (1/2)

Several stages are employed:1. Object Pose Prediction• In the pose space a Hough transform is employed for

obtaining a coarse prediction of the object pose, by using each matched keypoint for voting for all object pose that are consistent with the keypoint.

• A candidate object pose is obtained if at least 3 entries are found in a Hough bin.

2. Affine Transformation Calculation• A least-squares procedure is employed for finding an affine

transformation that correctly account for each obtained pose.

u

v

m1 m2

m3 m4

x

y

txty

Page 21: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Affine Transform Calculation (2/2)

3. Affine Transformation Verification Stages:• Verification using a probabilistic model (Bayes classifier).• Verification based on Geometrical Distortion• Verification based on Spatial Correlation• Verification based on Graphical Correlation• Verification based on the Object Rotation4. Transformations Merging based on Geometrical

Overlapping

In blue verification stages proposed by us for improving the detection of robots heads.

Page 22: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Input Image

Reference Images

AIBO Head Pose Detection Example

Page 23: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Matching & Storage of Local Descriptors

• Each reference image gives a set of keypoints.• Each keypoint have a graphical descriptor, which is a 128-

components vector.• All the (keypoint,vector) pairs corresponding to a set of

reference images are stored in a set T.

Reference image

x,y,n,v1

v2

...v128

x,y,n,v1

v2

...v128

x,y,n,v1

v2

...v128

x,y,n,v1

v2

...v128 ...

(1) (2) (3) (4)

= T

Page 24: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Reference image

p1

d1

p2

d2

p3

d3

p4

d4

...

= T

Matching & Storage of Local Descriptors

More compact notation

• Each reference image gives a set of keypoints.• Each keypoint have a graphical descriptor, which is a 128-

components vector.• All the (keypoint,vector) pairs corresponding to a set of

reference images are stored in a set T.

Page 25: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

• In the matching-generation stage, an input image gives another set of keypoints and vectors.

• For each input descriptor, the first and second nearest descriptors in T must be found.

• Then, a pair of nearest descriptors (d,dFIRST) gives a pair of matched keypoints (p,pFIRST).

Matching & Storage of Local Descriptors

Input image

p

d

...

Search in T

pFIRST

dFIRST

pSEC

dSEC

p1

d1

p2

d2 ...

Page 26: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

• The match is accepted if the ratio between the distance to the first nearest descriptor and the distance to the second nearest descriptor is lower than a given threshold

• This indicates that exists no possible confusion in the search results.

Matching & Storage of Local Descriptors

Accepted if:

distance( , ) < * distance ( , )d dFIRST d dSEC

Page 27: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

• A way to store the T set in a ordered way is using a kd-tree

• In this case, we will use a 128d-tree

• As well known, in a kd-tree the elements are stored in the leaves. The other nodes are divisions of the space in some dimension.

Storage: Kd-trees

1>2

2>3 2>5

13

27

65

89

All the vectors with more than 2 in the first dimension, stored at right side

Division node

Storage node

Page 28: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

• Generation of balanced kd-trees:

• We have a set of vectors

• We calculate the means and variances for each dimension i.

Storage: Kd-trees

a1

a2

b1

b2

c1

c2

d1

d2

………

M i 1

Nai bi c i ... ,Vi

1

Nai M i 2 (bi M i)

2 ...

Page 29: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Tree construction:• Select the dimension iMAX with the largest variance

• Order the vectors with respect to the iMAX dimension.

• Select the median M in this dimension.

• Get a division node.

• Repeat the process in a recursive way.

Storage: Kd-trees

iMAX>M

Nodes with iMAX component lesser than M

Nodes with iMAX component greater than M

Page 30: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Search process of the nearest neighbors, two alternatives:

• Compare almost all the descriptors in T with the given descriptor and return the nearest one, or

• Compare Q nodes at most, and return the nearest of them (compare calculate Euclidean distance)

• Requires a good search strategy

• It can fail

• The failure probability is controllable by Q

We choose the second option and we use the BBF (Best Bin First) algorithm.

Search Process

Page 31: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

• Set:• v: query vector

• Q: priority queue ordered by distance to v (initially void)

• r: initially is the root of T

• vFIRST: initially not defined and with an infinite distance to v

• ncomp: number of comparisons, initially zero.

• While (!finish):• Make a search for v in T from r => arrive to a leaf c

• Add all the directions not taken during the search to Q in an ordered way (each division node in the path gives one not-taken direction)

• If c is more near to v than vFIRST, then vFIRST=c

• Make r = the first node in Q (the more near to v), ncomp++

• If distance(r,v) > distance(vFIRST,v), finish=1

• If ncomp > ncompMAX, finish=1

Search Process: BBF Algorithm

Page 32: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

1>2

2>3

13

27

207

2>7

1>6

51500

91000

208?:

queue:

CMIN:

Distance between 2 and 20

Search ExampleRequested vector

1>2

18

•I am a pointer•20>2

•Go right

Not-taken option

18

Page 33: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Search Example

1>2

2>3

13

27

207

2>7

1>6

51500

91000

208?:

CMIN:

queue: 1>22>7

181

•8>7•Go right

18

1

comparisons: 0

Page 34: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

1>2

2>3

13

27

207

2>7

1>6

51500

91000

208?:

queue: 1>22>7

18

1>8

CMIN:

Search Example

1 14

•20>6•Go right

18

1

14

comparisons: 0

Page 35: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

1>2

2>3

13

27

207

2>7

1>6

51500

91000

208?:

queue: 1>22>7

181

1>8

14

CMIN: 91000

992

•We arrived to a leaf

•Store nearest leaf in CMIN

Search Example

992

14

18

1

comparisons: 1

Page 36: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

1>2

2>3

13

27

207

2>7

1>6

51500

91000

208?:

queue: 1>22>7

181

1>8

14

CMIN: 91000

992

•Distance from best-in-queue is

lesser than distance from

cMIN

•Start new search from

best in queue•Delete best

node in queue

Search Example

992

14

18

1

Page 37: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

1>2

2>3

13

27

207

2>7

1>6

51500

91000

208?:

queue: 1>2

18

1>8

12

CMIN: 91000

992

•Go down from here

Search Example

992

14

18

1

comparisons: 1

Page 38: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

1>2

2>3

13

27

207

2>7

1>6

51500

91000

208?:

queue: 1>2

18

1>8

12

CMIN:

•We arrived to a leaf

•Store nearest leaf in CMIN

992

14

18

1

1

207

1

Search Example

comparisons: 2

Page 39: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

1>2

2>3

13

27

207

2>7

1>6

51500

91000

208?:

queue: 1>2

18

1>8

12

CMIN:

992

14

18

1

1

207

1

Search Example

•Distance from best-in-queue is

NOT lesser than distance

from cMIN

•Finish

comparisons: 2

Page 40: Object Recognition using Local Descriptors Javier Ruiz-del-Solar, and Patricio Loncomilla Center for Web Research Universidad de Chile

Conclusions• BBF+Kd-trees: good trade off between short search time

and high success probability.

• But, perhaps BBF+ Kd-trees is not the optimal solution.

• Finding a better methodology is very important to massive applications (as an example, for Web image retrieval)