Lecture 9 Feature Extraction and Motion Estimation

Rover Localization With Descent Imagery

Lecture 9Feature Extraction and Motion EstimationSlides by:Michael BlackClark F. OlsonJean Ponce

2MotionRather than using two cameras, we can extract information about the environment by moving a single camera.

Some motion problems are similar to stereo:CorrespondenceReconstruction

New problem: motion estimation

Sometimes another problem is also present:Segmentation: Which image regions correspond to rigidly moving objects.Given m pictures of n points, can we recover the three-dimensional configuration of these points? the camera configurations?(structure)(motion)Some textbooks treat motion largely from the perspective of small camera motions. We will not be so limited!Structure From Motion3x1jx2jx3jXjP1P2P34Structure From MotionSeveral questions must be answered:

What image points should be matched?- feature selectionWhat are the correct matches between the images?- feature tracking (unlike stereo, no epipolar constraint)Given the matches, what is camera motion?Given the matches, where are the points?

Simplifying assumption: scene is static.- objects dont move relative to each other

5Feature SelectionWe could track all image pixels, but this requires excessive computation.

We want to select features that are easy to find in other images.

Edges are easy to find in one direction, but not the other: aperture problem!

Corner points (with gradients in multiple directions) can be precisely located.Corner DetectionWe should easily recognize the point by looking through a small window. Shifting a window in any direction should give a large change in intensity.

6

edge:no change along the edge direction

corner:significant change in all directions

flat region:no change in all directionsSource: A. EfrosBasic idea for corner detection:Find image patches with gradients in multiple directions.

InputCorners selectedCorner Detection7Corner Detection8

2 x 2 matrix of image derivatives (averaged in neighborhood of a point).Notation:Corner detection912Corner1 and 2 are large, 1 ~ 2;E increases in all directions1 and 2 are small;E is almost constant in all directionsEdge 1 >> 2Edge 2 >> 1Flat regionClassification of image points using eigenvalues of M:Harris Corner Detector10Compute M matrix for each image window to get their cornerness scores.Find points whose surrounding window gave large corner response. Take the points of local maxima, i.e., perform non-maximum suppression.Harris Corner Detector11

InputimagesHarris Corner Detector12

CornernessscoresHarris Corner Detector13

ThresholdedHarris Corner Detector14

LocalmaximaHarris Corner Detector15

CornersoutputHarris Detector Properties16Rotation invariant?

Scale invariant?All points will be classified as edgesCorner !YesNoAutomatic Scale Selection17Intuition: Find scale that gives local maxima of some function f in both position and scale.Choosing a Detector18What do you want it for?Precise localization in x-y: HarrisGood localization in scale: Difference of GaussianFlexible region shape: MSER

Best choice often application dependentHarris-/Hessian-Laplace/DoG work well for many natural categoriesMSER works well for buildings and printed things

Why choose?Get more points with more detectors

There have been extensive evaluations/comparisons[Mikolajczyk et al., IJCV05, PAMI05]All detectors/descriptors shown here work well19Feature TrackingDetermining the corresponding features is similar to stereo vision.Problem: epipolar lines unknown- Matching point could be anywhere in the image.

If small motion between images, can search only in small neighborhood.Otherwise, large search space necessary.- Coarse-to-fine search used to reduce computation time.Feature TrackingChallenges:Figure out which features can be trackedEfficiently track across framesSome points may change appearance over time (e.g., due to rotation, moving into shadows, etc.)Drift: small errors can accumulate as appearance model is updatedPoints may appear or disappear: need to be able to add/delete tracked points

2021Feature MatchingExample:

The set of vectors from each image location to the corresponding location in the subsequent image is called a motion field.

22Feature MatchingExample:

If the camera motion is purely translation, the motion vectors all converge at the focus-of-expansion.

23AmbiguityThe relative position between the cameras has six degrees of freedom (six parameters):- Translation in x, y, z- Rotation about x, y, z

Problem: images looks exactly the same if everything is scaled by a constant factor. For example:- Cameras twice as far apart- Scene twice as big and twice as far away

Can only recover 5 parameters.- Scale cant be determined, unless known in advanceScale Ambiguity24

Structure From Motion25Given a set of corresponding points in two or more images, compute the camera parameters and the 3D point coordinatesCamera 1Camera 2Camera 3R1,t1R2,t2R3,t3???Slide credit: Noah Snavely?26Solving for Structure and MotionTotal number of unknown values:- 5 camera motion parameters- n point depths (where n is the number of points matched)

Total number of equations:- 2n (each point match has a constraint on the row and column)

Can (in principle) solve for unknowns if 2n 5 + n (n 5)Usually, many more matches than necessary are used.- Improves performance with respect to noise27Solving for Structure and Motion

Once the motion is known, dense matching is possible using the epipolar constraint.28Multiple ImagesIf there are more than two images, similar ideas apply:- Perform matching between all images- Use constraints given by matches to estimate structure and motion

For m images and n points, we have:- 6(m-1)-1+n unknowns = 6m-7+n- 2(m-1)n constraints = 2mn-2n

Can (in principle) solve when n is at least (6m-7)/(2m-3).Bundle adjustment29Non-linear method for refining structure and motionMinimizing reprojection error

x1jx2jx3jXjP1P2P3P1XjP2XjP3Xj30Stereo Ego-motionOne application of structure from motion is to determine the path of a robot by examining the images that it takes.

The use of stereo provides several advantages:- The scale is known, since we can compute scene depths- There is more information for matching points (depth)31Stereo Ego-motionStereo ego-motion loop:Feature selection in first stereo pair.Stereo matching in first stereo pair.Feature tracking into second stereo pair.Stereo matching in second stereo pair.Motion estimation using 3D feature positions.Repeat with new images until done.Ego-motion steps

Features selectedFeatures matched in right imageFeatures tracked in left imageFeatures tracked in right image3233Stereo Ego-motion

UrbieOdometry trackActual track (GPS)Estimated track

34Advanced Feature Matching

Right imageLeft imageLeft image after affine optimization

Documents

Lecture 9 Feature Extraction and Motion Estimation