EECS 274 Computer Vision Local Invariant Features

Preview:

Citation preview

EECS 274 Computer Vision

Local Invariant Features

:Local features

• Matching with Harris Detector• Scale-invariant Feature Detection• Scale Invariant Image Descriptors• Affine-invariant Feature Detection• Object Recognition• SIFT Features

• Reading: S Chapter 4

Examples

Features

What local features to use?

Aperture problem

Ambiguity of 1-dimensional motion perception

Stripes moved upward 6 pixels

Stripes moved left 5 pixels

Introduction

Local invariant photometric descriptors

( )local descriptor

Local : robust to occlusion/clutter + no segmentationPhotometric : distinctiveInvariant : to image transformations + illumination changes

History - Recognition

Color histogram [Swain 91]

b

g

r

Each pixel is describedby a color vector

Distribution of color vectorsis described by a histogram

not robust to occlusion, not invariant, not distinctive

History - Recognition

Eigenimages [Turk 91]• Each face vector is represented in the eigenimage space

– eigenvectors with the highest eigenvalues = eigenimages

• The new image is projected into the eigenimage space– determine the closest face

. .1v.

3v

2v.

not robust to occlusion, requires segmentation, not invariant, discriminant

History - Recognition

Geometric invariants [Rothwell 92]• Function with a value independent of the

transformation

• Invariant for image rotation : distance of two points

• Invariant for planar homography : cross-ratio local and invariant, not discriminant, requires sub-pixel extraction of primitives

tt yxTyxyxfyxf ),(),( where),(),(

History - Recognition

Problems : occlusion, clutter, image transformations, distinctiveness

Solution : recognition with local photometric invariants

[ Local greyvalue invariants for image retrieval, C. Schmid and R. Mohr, PAMI 1997 ]

Approach

1) Extraction of interest points (characteristic locations)

2) Computation of local descriptors

3) Determining correspondences

4) Selection of similar images

( )local descriptor

Matching with interest points

• Extraction of interest points with the Harris detector

• Comparison of points with cross-correlation

• Verification with the fundamental matrix

Moravec corner detector

• Developed for Stanford Cart in 1977

Moravec corner detectorChange of intensity for the shift [u,v]:

2,

( , ) ( , ) ( , ) ( , )x y

E u v w x y I x u y v I x y

IntensityShifted intensity

Window function

Four shifts: (u,v) = (1,0), (1,1), (0,1), (-1, 1)Look for local maxima in min(E)

Problems of Moravec detector• Noisy response due to a binary window function• Only a set of shifts at every 45 degree is

considered• Only minimum of E is taken into account

Harris corner detector (1988) solves these problems.

Harris detector

Based on the idea of auto-correlation

Important difference in all directions interest point

Interest points

Geometric featuresrepeatable under transformations

2D characteristics of the signalhigh informational content

Comparison of different detectors [Schmid98] Harris detector

Harris detectorAuto-correlation function for a point and a shift

Discrete shifts can be avoided with the auto-correlation matrix

),( yx ),(),( yxvu

x y

vyuxIyxIyxwvuE 2)),(),()(,(),(

v

uyxIyxIyxIvyuxI yx )),(),((),(),(with

2

2

,

2

,

,,

2

2

*

)),()(,(),(),(*),(

),(),(*),()),((*),(

),(),(),(),(

yyx

yxx

T

yxy

yxyx

yxyx

yxx

yyx

x

III

IIIwA

A

v

u

yxIyxwyxIyxIyxw

yxIyxIyxwyxIyxw

vu

v

uyxIyxIyxwvuE

uu

Comparison of detectors: Rotation

[Comparing and Evaluating Interest Points, Schmid, Mohr & Bauckhage, ICCV 98]

repeatability = #good matches/mean(#points)

Comparison of detectors: Perspective

[Comparing and Evaluating Interest Points, Schmid, Mohr & Bauckhage, ICCV 98]

repeatability = #good matches/mean(#points)

Harris corner detectorIntensity change in shifting window: eigenvalue analysis

1, 2 – eigenvalues of A

direction of the slowest change

direction of the fastest change

(max)-1/2

(min)-1/2

Ellipse E(u,v) = const

2

2

*

),()(

yyx

yxx

T

III

IIIwA

AvuEE uuu

uncertainty ellipse

Shi and Tomasi use min(1, 2)

to locate good features to track

Harris corner detector

1

2

Corner1 and 2 are large, 1 ~ 2;E increases in all directions

1 and 2 are small;E is almost constant in all directions

edge 1 >> 2

edge 2 >> 1

flat

Classification of image points using eigenvalues of M:

Harris detection• Auto-correlation matrix

– captures the structure of the local neighborhood

– measure based on eigenvalues of this matrix• 2 strong eigenvalues => interest point• 1 strong eigenvalue => contour• 0 eigenvalue => uniform region

• Interest point detection– threshold on the eigenvalues– local maximum for localization

Harris corner detectorMeasure of corner response:

(k – empirical constant, k = 0.04-0.06)

22121

2

)(

))(()det(

k

AtracekAR

Example

Example

Example

Good features

Using auto-correlation or Hessian matrix

Local descriptors

Descriptors characterize the local neighborhood of a point

local descriptor

( )

Local jet

Convolution of image I with Gaussian derivatives

)(*),(

)(*),(

)(*),(

)(*),(

)(),(

)(),(

),(

yy

xy

xx

y

x

GyxI

GyxI

GyxI

GyxI

GyxI

GyxI

yxv

)2

),(exp(

2

1),),((

),(),()(),(

2

2

2

tt yx

yxG

ydxdyyxxIyxGGyxI

N-Jet, local jet

nsor epsilon te ricantisymmet theis where

2

2

)(

ij

yyyyxyxyxxxx

yyxx

yyyyyxxyxxxx

yyxx

kjiijk

lkijklij

ljiijkkkjiij

llijkklkijklij

ijij

ii

jiji

ii

LLLLLL

LL

LLLLLLLL

LLLL

L

LLLL

LLLL

LLLLLLLL

LLLLLLLL

LL

L

LLL

LL

L

• Invariance to image rotation : differential invariants [Koen87]

Local descriptors

Robustness to illumination changes

In case of an affine transformation

2

2

2

2

2/1

2/3

)(

)(

)(

)

)(

)(

)(

)(

ii

kjiijk

ii

lkijklij

ii

kjiijkkkjiij

ii

llijkklkijklij

ii

jiij

ii

ii

ii

jiji

LL

LLLL

LL

LLLL

LL

LLLLLLLL

LL

LLLLLLLL

LL

LL

LL

L

LL

LLL

baII )()( 21 xx

or normalization of the image patch with mean and variance

Determining correspondences

Vector comparison using the Mahalanobis distance

=?

)()(),( 1 qpqpqp TMdist

( )( )

Selection of similar images

• In a large database – voting algorithm– additional constraints

• Rapid acces with an indexing mechanism

Voting algorithm

local characteristicsvector of

( )

Voting algorithm

} }

1 1 02 1 1

I is the corresponding model image1

2 1 1

Additional constraints

• Semi-local constraints– neighboring points should match– angles, length ratios should be similar

• Global constraints– robust estimation of the image transformation (homogaphy,

epipolar geometry)

11

2

3

2

3

Results

database with ~1000 images

Results

Harris detector

Interest points extracted with Harris (~ 500 points)

Cross-correlation matching

Initial matches (188 pairs)

Global constraints

Robust estimation of the fundamental matrix

99 inliers 89 outliers

Summary of Harris detector

• Very good results in the presence of occlusion and clutter– local information– discriminant greyvalue information – invariance to image rotation and illumination

• Not invariance to scale and affine changes

• Solution for more general view point changes– local invariant descriptors to scale and rotation– extraction of invariant points and regions

Scale Invariant Feature Detection• Consider two images of the same scene,

related by a scale change (i.e. zooming)• How are their scale space representations

related? Scale Space Theory (Lindeberg ’98):

Normalized derivatives have the same value atcorresponding relative scales

handat task for theparameter free a is

),,()',','('

'

)','(),(

),()','(

, s,derivative normalized

)1(

2

2/2/

tyxLstyxL

tst

yxIyxI

sysxyx

tt

m

yx

Harris detector + scale changes

Scale Adapted Harris Detector

Many corresponding points at which the scale factor correspondsto scale change between images

Scale invariant Harris points• Multi-scale extraction of Harris interest points

• Selection of points at characteristic scale in scale space

Laplacian

Chacteristic scale :

- maximum in scale space

- scale invariant

Scale invariant interest points

invariant points + associated regions [Mikolajczyk & Schmid’01] Harris-Laplacian Feature

multi-scale Harris points

selection of points

at the characteristic scale

with Laplacian

Harris detector – adaptation to scale

Evaluation of scale invariant detectors

repeatability – scale changes

SIFT: Overview

• 1999• Generates image features,

“keypoints”– invariant to image scaling and rotation– partially invariant to change in

illumination and 3D camera viewpoint– many can be extracted from typical

images– highly distinctive

Algorithm overview

1. Scale-space extrema detection– Uses difference-of-Gaussian function

2. Keypoint localization– Sub-pixel location and scale fit to a

model

3. Orientation assignment– 1 or more for each keypoint

4. Keypoint descriptor– Created from local image gradients

Scale space

• Definition: where

),(),,(),,( yxIyxGyxL 222 2/)(

22

1),,(

yxeyxG

Scale space

• Keypoints are detected using scale-space extrema in difference-of-Gaussian function D

• D definition:

• Efficient to compute

),,(),,(

),()),,(),,((),,(

yxLkyxL

yxIyxGkyxGyxD

Relationship of D to

• Close approximation to scale-normalized Laplacian of Gaussian,

• Diffusion equation:

• Approximate ∂G/∂σ:– giving,

• When D has scales differing by a constant

factor it already incorporates the σ2 scale normalization required for scale-invariance

• e.g.,

G22

G22

GG 2

k

yxGkyxGG ),,(),,(

Gk

yxGkyxG 2),,(),,(

GkyxGkyxG 22)1(),,(),,(

2k

Scale space construction

2k2σ

2kσ

σ

2kσ

σ

Scale space• A collection of images obtained by progressively smoothing

the input image• Analogous to gradually reducing image resolution • See Vedaldi’s implementation (http://www.vlfeat.org/ ) for

details• Discretized scales

)),(min(logO 3,S

octaves of # : octave,per scales of # :

1,,,1,,0,6.1

,2),(

2

minmin0

),()1,(),()/(

0

hw

soosoosooSs

II

Os

OoooSs

IIDoGos

Scale space images

first octave

second octave

third octave

fourth octave

Difference-of-Gaussian images

first octave

second octave

third octave

fourth octave

Frequency of sampling

• There is no minimum• Best frequency determined experimentally

Prior smoothing for each octave

• Increasing σ increases robustness, but costs• σ = 1.6 a good tradeoff• Doubling the image initially increases

number of keypoints

Finding extrema

• Sample point D(x,y,σ) is selected only if it is a minimum or a maximum of these points

DoG scale space

Extrema in this image

Localization

• 3D quadratic function is fit to the local sample points

• Start with Taylor expansion with sample point as the origin– where

• Take the derivative with respect to X, and set it to 0, giving

• is the location of the keypoint• This is a 3x3 linear system

2

2

2

1)(

DDDD T

T

Tyx ),,(

XX

D

X

D ˆ02

2

DD2

12

ˆ

Localization

• Hessian and derivative approximated by finite differences,– example:

• If X is > 0.5 in any dimension, process repeated

x

Dy

D

D

x

y

x

D

yx

D

x

Dyx

D

y

D

y

Dx

D

y

DD

2

222

2

2

22

22

2

2

4

)()(

1

2

2

,11

,11

,11

,11

2

,1

,,1

2

2

,1

,1

jik

jik

jik

jik

jik

jik

jik

jik

jik

DDDD

y

D

DDDD

DDD

XX

D

X

D ˆ02

2

Filtering

• Contrast (use prev. equation):– If |D(X)| < 0.03, throw it out

• Edge-iness:– Use ratio of principal curvatures to throw out

poorly defined peaks– Curvatures come from Hessian:– Ratio of Trace(H)2 and Determinant(H)

– If ratio > (r+1)2/(r), throw it out (SIFT uses r=10)

XD

DDT

ˆ2

1)ˆ(

yyxy

xyxx

DD

DDH

22

2 )(

)(

)(,)()(

)(

HDet

HTrratioDDDHDet

DDHTr

xyyyxx

yyxx

Orientation assignment

• Descriptor computed relative to keypoint’s orientation achieves rotation invariance

• Precomputed along with mag. for all levels (useful in descriptor computation)

• Multiple orientations assigned to keypoints from an orientation histogram– Significantly improve stability of matching

))),1(),1(/())1,()1,(((2tan),(

))1,()1,(()),1(),1((),( 22

yxLyxLyxLyxLayx

yxLyxLyxLyxLyxm

Keypoint images

Keypoint Selection

Finding extremawith DoG

Removing|D(X)| < 0.03

Removing|D(X)| < 0.03

Select canonical orientation• Create histogram of local

gradient directions computed at selected scale

• Assign canonical orientation at peak of smoothed histogram

• Each key specifies stable 2D coordinates (x, y, scale, orientation)

0 2

Descriptor

• Descriptor has 3 dimensions (x,y,θ)• Orientation histogram of gradient magnitudes• Position and orientation of each gradient sample

rotated relative to keypoint orientation

Descriptor

• Weight magnitude of each sample point by Gaussian weighting function

• Distribute each sample to adjacent bins by trilinear interpolation (avoids boundary effects)

Descriptor• Best results achieved with 4x4x8 =

128 descriptor size• Normalize to unit length

– Reduces effect of illumination change

• Cap each element to 0.2, normalize again– Reduces non-linear illumination changes– 0.2 determined experimentally

Object detection

• Create a database of keypoints from training images

• Match keypoints to a database– Nearest neighbor

search

PCA-SIFT

• Different descriptor (same keypoints)• Apply PCA to the gradient patch• Descriptor size is 20 (instead of 128)• More robust, faster

SIFT: Summary

• Scale space• Difference-of-Gaussian• Localization• Filtering• Orientation assignment• Descriptor, 128 elements

Object recognition

• Definition: Identify an object and determine its pose and model parameters

• Commercial object recognition– $4 billion/year industry for inspection and

assembly– Almost entirely based on template matching

• Upcoming applications– Mobile robots, toys, user interfaces– Location recognition– Digital camera panoramas, 3D scene modeling

Invariant local features• Image content => local feature coordinates invariant to translation, rotation, scale• Technical details regarding

– Keypoint matching (using 1st and 2nd nearest neighbors)– Efficient nearest neighbor indexing – Clustering with Hough transform (model location, orientation, scale)– Account for affine distortion

Features

Advantages of invariant local features

• Locality: features are local, so robust to occlusion and clutter (no prior segmentation)

• Distinctiveness: individual features can be matched to a large database of objects

• Quantity: many features can be generated for even small objects

• Efficiency: close to real-time performance

Experimental evaluation

Scale change (factor 2.5)

Harris-Laplace DoG

Viewpoint change (60 degrees)

Harris-Affine (Harris-Laplace)

Descriptors - conclusion

• SIFT + steerable perform best

• Performance of the descriptor independent of the detector

• Errors due to imprecision in region estimation, localization

Image retrieval

…> 5000images

change in viewing angle

Matches

22 correct matches

Image retrieval

…> 5000images

change in viewing angle

+ scale change

Matches

33 correct matches

Multiple panoramas from an unordered image set

Image registration and blending

Location recognition

Robot localization• Joint work with Stephen Se, Jim Little

Recognizing panoramas• Matthew Brown and David Lowe• Recognize overlap from an unordered set of

images and automatically stitch together• SIFT features provide initial feature matching• Image blending at multiple scales hides the

seams

Panorama automatically assembled from 143 images

Map continuously built over time

Planar recognition• Planar surfaces can be

reliably recognized at a rotation of 60° away from the camera

• Affine fit approximates perspective projection

• Only 3 points are needed for recognition

3D object recognition• Extract

outlines with background subtraction

3D object recognition

• Only 3 keys are needed for recognition, so extra keys provide robustness

• Affine model is no longer as accurate

Recognition under occlusion

Comparison to template matching

• Costs of template matching– 250,000 locations x 30 orientations x 4 scales =

30,000,000 evaluations– Does not easily handle partial occlusion and other

variation without large increase in template numbers– Viola & Jones cascade must start again for each

qualitatively different template

• Costs of local feature approach– 3000 evaluations (reduction by factor of 10,000)– Features are more invariant to illumination, 3D rotation,

and object variation– Use of many small subtemplates increases robustness to

partial occlusion and other variations

More local features

• Harris Laplace• Harris Affine• Hessian detector• Hessian Laplace• Hessian Affine• MSER (Maximally Stable Extremal

Regions)• SURF (Speeded-Up Robust Feature)

Recommended