56
776 Computer Vision Jan-Michael Frahm Spring 2012

776 Computer Vision

  • Upload
    aderes

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

776 Computer Vision. Jan-Michael Frahm Spring 2012. Image alignment. Image from http://graphics.cs.cmu.edu/courses/15-463/2010_fall/. A look into the past. http://blog.flickr.net/en/2010/01/27/a-look-into-the-past/. A look into the past. http://komen-dant.livejournal.com/345684.html. - PowerPoint PPT Presentation

Citation preview

Page 1: 776 Computer Vision

776 Computer VisionJan-Michael Frahm

Spring 2012

Page 2: 776 Computer Vision

Image alignment

Image from http://graphics.cs.cmu.edu/courses/15-463/2010_fall/

Page 3: 776 Computer Vision

A look into the past

http://blog.flickr.net/en/2010/01/27/a-look-into-the-past/

Page 4: 776 Computer Vision

A look into the past• Leningrad during the blockade

http://komen-dant.livejournal.com/345684.html

Page 6: 776 Computer Vision

Image alignment: Applications

Panorama stitching

Recognitionof objectinstances

Page 7: 776 Computer Vision

Image alignment: Challenges

Small degree of overlap

Occlusion,clutter

Intensity changes

Page 8: 776 Computer Vision

Image alignment

• Two families of approaches:o Direct (pixel-based) alignment

• Search for alignment where most pixels agree

o Feature-based alignment• Search for alignment where extracted features agree• Can be verified using pixel-based alignment

Page 9: 776 Computer Vision

Alignment as fitting• Previous lectures: fitting a model to features in one

image

i

i Mx ),(residual

Find model M that minimizes

M

xi

Page 10: 776 Computer Vision

Alignment as fitting• Previous lectures: fitting a model to features in one

image

• Alignment: fitting a model to a transformation between pairs of features (matches) in two images

i

i Mx ),(residual

i

ii xxT )),((residual

Find model M that minimizes

Find transformation T that minimizes

M

xi

T

xixi'

Page 11: 776 Computer Vision

2D transformation models

• Similarity(translation, scale, rotation)

• Affine

• Projective(homography)

Page 12: 776 Computer Vision

Let’s start with affine transformations

• Simple fitting procedure (linear least squares)• Approximates viewpoint changes for roughly

planar objects and roughly orthographic cameras• Can be used to initialize fitting for more complex

models

Page 13: 776 Computer Vision

Fitting an affine transformation

• Assume we know the correspondences, how do we get the transformation?

),( ii yx ),( ii yx

2

1

43

21

tt

yx

mmmm

yx

i

i

i

i

i

i

ii

ii

yx

ttmmmm

yxyx

2

1

4

3

2

1

10000100

Page 14: 776 Computer Vision

Fitting an affine transformation

• Linear system with six unknowns• Each match gives us two linearly independent

equations: need at least three to solve for the transformation parameters

i

i

ii

ii

yx

ttmmmm

yxyx

2

1

4

3

2

1

10000100

Page 15: 776 Computer Vision

Fitting a plane projective transformation

• Homography: plane projective transformation (transformation taking a quad to another arbitrary quad)

Page 16: 776 Computer Vision

Homography• The transformation between two views of a planar

surface

• The transformation between images from two cameras that share the same center

Page 17: 776 Computer Vision

Application: Panorama stitching

Source: Hartley & Zisserman

Page 18: 776 Computer Vision

Fitting a homography• Recall: homogeneous coordinates

Converting to homogeneousimage coordinates

Converting from homogeneousimage coordinates

Page 19: 776 Computer Vision

Fitting a homography• Recall: homogeneous coordinates

• Equation for homography:

Converting to homogeneousimage coordinates

Converting from homogeneousimage coordinates

11 333231

232221

131211

yx

hhhhhhhhh

yx

Page 20: 776 Computer Vision

Fitting a homography• Equation for homography:

ii xHx

11 333231

232221

131211

i

i

i

i

yx

hhhhhhhhh

yx

0 ii xHx

iT

iiT

i

iT

iiT

iT

iT

i

iT

iT

iT

i

i

yxx

yyx

xhxhxhxhxhxh

xhxhxh

3

2

1

12

31

23

1

00

00

3

2

1

hhh

xxxxxx

TTii

Tii

Tii

TTi

Tii

Ti

T

xyxy

3 equations, only 2 linearly

independent

Page 21: 776 Computer Vision

Direct linear transform

• H has 8 degrees of freedom (9 parameters, but scale is arbitrary)

• One match gives us two linearly independent equations

• Four matches needed for a minimal solution (null space of 8x9 matrix)

• More than four: homogeneous least squares

0

00

00

3

2

1111

111

hhh

xxxx

xxxx

Tnn

TTn

Tnn

Tn

T

TTT

TTT

xy

xy

0hA

Page 22: 776 Computer Vision

Robust feature-based alignment

• So far, we’ve assumed that we are given a set of “ground-truth” correspondences between the two images we want to align

• What if we don’t know the correspondences?

),( ii yx ),( ii yx

Page 23: 776 Computer Vision

Robust feature-based alignment

• So far, we’ve assumed that we are given a set of “ground-truth” correspondences between the two images we want to align

• What if we don’t know the correspondences?

?

Page 24: 776 Computer Vision

Robust feature-based alignment

Page 25: 776 Computer Vision

Robust feature-based alignment

• Extract features

Page 26: 776 Computer Vision

Robust feature-based alignment

• Extract features• Compute putative matches

Page 27: 776 Computer Vision

Robust feature-based alignment

• Extract features• Compute putative matches• Loop:

o Hypothesize transformation T

Page 28: 776 Computer Vision

Robust feature-based alignment

• Extract features• Compute putative matches• Loop:

o Hypothesize transformation To Verify transformation (search for other matches consistent with T)

Page 29: 776 Computer Vision

Robust feature-based alignment

• Extract features• Compute putative matches• Loop:

o Hypothesize transformation To Verify transformation (search for other matches consistent with T)

Page 30: 776 Computer Vision

Generating putative correspondences

?

Page 31: 776 Computer Vision

Generating putative correspondences

• Need to compare feature descriptors of local patches surrounding interest points

( ) ( )=?

featuredescriptor

featuredescriptor

?

Page 32: 776 Computer Vision

Feature descriptors• Recall: covariant detectors => invariant

descriptorsDetect regions Normalize regions

Compute appearancedescriptors

Page 33: 776 Computer Vision

• Simplest descriptor: vector of raw intensity values

• How to compare two such vectors?o Sum of squared differences (SSD)

• Not invariant to intensity change

o Normalized correlation

• Invariant to affine intensity change

Feature descriptors

i

ii vuvu 2),SSD(

jj

jj

i ii

vvuu

vvuuvu

22 )()(

))((),(

Page 34: 776 Computer Vision

Disadvantage of intensity vectors as descriptors

• Small deformations can affect the matching score a lot

Page 35: 776 Computer Vision

• Descriptor computation:o Divide patch into 4x4 sub-patcheso Compute histogram of gradient orientations (8 reference angles) inside

each sub-patcho Resulting descriptor: 4x4x8 = 128 dimensions

Feature descriptors: SIFT

David G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60 (2), pp. 91-110, 2004.

Page 36: 776 Computer Vision

• Descriptor computation:o Divide patch into 4x4 sub-patcheso Compute histogram of gradient orientations (8 reference angles) inside

each sub-patcho Resulting descriptor: 4x4x8 = 128 dimensions

• Advantage over raw vectors of pixel valueso Gradients less sensitive to illumination changeo Pooling of gradients over the sub-patches achieves robustness to small

shifts, but still preserves some spatial information

Feature descriptors: SIFT

David G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60 (2), pp. 91-110, 2004.

Page 37: 776 Computer Vision

Feature matching

?

• Generating putative matches: for each patch in one image, find a short list of patches in the other image that could match it based solely on appearance

Page 38: 776 Computer Vision

Feature space outlier rejection• How can we tell which putative matches are more

reliable?• Heuristic: compare distance of nearest neighbor to

that of second nearest neighboro Ratio of closest distance to second-closest distance will be high

for features that are not distinctive

David G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60 (2), pp. 91-110, 2004.

Threshold of 0.8 provides good separation

Page 39: 776 Computer Vision

Reading

David G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60 (2), pp. 91-110, 2004.

Page 40: 776 Computer Vision

RANSAC• The set of putative matches contains a very high

percentage of outliers

• RANSAC loop:1. Randomly select a seed group of matches2. Compute transformation from seed group3. Find inliers to this transformation 4. If the number of inliers is sufficiently large, re-

compute least-squares estimate of transformation on all of the inliers

• Keep the transformation with the largest number of inliers

Page 41: 776 Computer Vision

RANSAC example: Translation

Putative matches

Page 42: 776 Computer Vision

RANSAC example: Translation

Select one match, count inliers

Page 43: 776 Computer Vision

RANSAC example: Translation

Select one match, count inliers

Page 44: 776 Computer Vision

RANSAC example: Translation

Select translation with the most inliers

Page 45: 776 Computer Vision

Scalability: Alignment to large databases

• What if we need to align a test image with thousands or millions of images in a model database?o Efficient putative match generation

• Approximate descriptor similarity search, inverted indices

Model database

?Test image

Page 46: 776 Computer Vision

Scalability: Alignment to large databases

• What if we need to align a test image with thousands or millions of images in a model database?o Efficient putative match generation

• Fast nearest neighbor search, inverted indexes

D. Nistér and H. Stewénius, Scalable Recognition with a Vocabulary Tree, CVPR 2006

Test image

Database

Vocabulary tree with inverted

index

Page 47: 776 Computer Vision

Descriptor space Slide credit: D. Nister

Page 48: 776 Computer Vision

Hierarchical

partitioning of

descriptor space

(vocabulary tree) Slide credit: D. Nister

Page 49: 776 Computer Vision

Slide credit: D. NisterVocabulary tree/inverted index

Page 50: 776 Computer Vision

Populating the vocabulary tree/inverted indexSlide credit: D. Nister

Model images

Page 51: 776 Computer Vision

Populating the vocabulary tree/inverted indexSlide credit: D. Nister

Model images

Page 52: 776 Computer Vision

Populating the vocabulary tree/inverted indexSlide credit: D. Nister

Model images

Page 53: 776 Computer Vision

Populating the vocabulary tree/inverted indexSlide credit: D. Nister

Model images

Page 54: 776 Computer Vision

Looking up a test image Slide credit: D. Nister

Test imageModel images

Page 55: 776 Computer Vision

Voting for geometric transformations

• Modeling phase: For each model feature, record 2D location, scale, and orientation of model (relative to normalized feature coordinate frame)

model

index

Page 56: 776 Computer Vision

Voting for geometric transformations

• Test phase: Each match between a test and model feature votes in a 4D Hough space (location, scale, orientation) with coarse bins

• Hypotheses receiving some minimal amount of votes can be subjected to more detailed geometric verification

model

test image

index