58
From 2D to 3D: Monocular Vision With application to robotics/AR

From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Embed Size (px)

Citation preview

Page 1: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

From 2D to 3D:Monocular VisionWith application to robotics/AR

Page 2: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Motivation

How many sensors do we really need?

Page 3: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Motivation

● What is the limit of what can be inferred from a single embodied (moving) camera frame?

Page 4: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …
Page 5: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Aim

● AR with a hand-held camera● Visual Tracking provides registration● Track without prior model of world● Challenges

● Speed● Accuracy ● Robustness● Interaction with real world

Page 6: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Existing attempts: SLAM

● Simultaneous Localization and Mapping● Well-established in robotics (using a rich array

of sensors)● Demonstrated with a single hand-held camera

by Davison 2003

Page 7: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Model-based tracking vs SLAM

Page 8: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Model-based tracking vs SLAM

● Model-based tracking is● More robust● More accurate

● Why?● SLAM fundamentally harder?

Page 9: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Pinhole camera model

X ,Y ,Z ↦ fX /Z , fY /Z

XYZ1↦ fXfYZ =[

f 0f 0

1 0 ]XYZ1 PXx =

Page 10: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Pinhole camera model

=

++

101

01

01

1Z

Y

X

pf

pf

Z

ZpYf

ZpXf

y

x

x

x

=

1y

x

pf

pf

K calibration matrix [ ]0|IKP =

principal point: ),( yx pp

Page 11: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Camera rotation and translation

( )C~

-X~

RX~

cam =

X10

C~

RR

1

X~

10

C~

RRXcam

−=

−=

[ ] [ ]XC~

R|RKX0|IKx cam −== [ ],t|RKP = C~

Rt −=

In non-homogeneouscoordinates:

Note: C is the null space of the camera projection matrix (PC=0)

Page 12: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Triangulation• Given projections of a 3D point in two

or more images (with known camera matrices), find the coordinates of the point

O1O2

x1x2

X?

Page 13: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

• Given: m images of n fixed 3D pointsxij = Pi Xj , i = 1, … , m, j = 1, … , n

• Problem: estimate m projection matrices Pi and n 3D points Xj from the mn correspondences xij

Structure from Motion (SfM)

x1j

x2j

x3j

Xj

P1

P2

P3

Page 14: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

SfM ambiguity

• If we scale the entire scene by some factor k and, at the same time, scale the camera matrices by the factor of 1/k, the projections of the scene points in the image remain exactly the same:

)(1

XPPXx kk

==

It is impossible to recover the absolute scale of the scene!

Page 15: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

• Given: m images of n fixed 3D pointsxij = Pi Xj , i = 1, … , m, j = 1, … , n

Problem: estimate m projection matrices Pi and n 3D points Xj from the mn correspondences xij

• With no calibration info, cameras and points can only be recovered up to a 4x4 projective transformation Q:

X QX, P PQ→ → -1

• We can solve for structure and motion when 2mn >= 11m +3n

• For two cameras, at least 7 points are needed

Structure from Motion (SfM)

Page 16: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

• Non-linear method for refining structure and motion (Levenberg-Marquardt)

• Minimizing re-projection error

Bundle Adjustment

( )2

1 1

,),( ∑∑= =

=m

i

n

jjiijDE XPxXP

x1j

x2j

x3j

Xj

P1

P2

P3

P1Xj

P2Xj

P3Xj

Page 17: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

• Self-calibration (auto-calibration) is the process of determining intrinsic camera parameters directly from uncalibrated images

• For example, when the images are acquired by a single moving camera, we can use the constraint that the intrinsic parameter matrix remains fixed for all the images

● Compute initial projective reconstruction and find 3D projective transformation matrix Q such that all camera matrices are in the form Pi = K [Ri | ti]

• Can use constraints on the form of the calibration matrix: zero skew

Self-calibration

Page 18: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Why is this cool?

http://www.youtube.com/watch?v=sQegEro5Bfo

Page 19: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Why is this still cool?

http://www.youtube.com/watch?v=p16frKJLVi0

Page 20: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

• Simultaneous Localization And Mapping• A robot is exploring an unknown, static

environment• Given:

• The robot's controls

• Observations of nearby features

• Estimate:

• Map of features

• Path of the robot

The SLAM Problem

Page 21: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Structure of the landmark-based SLAM Problem

Page 22: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

SLAM a hard problem??SLAM: robot path and map are both unknown

Robot path error correlates errors in the map

Page 23: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

SLAM a hard problem??

Robot poseuncertainty

• In the real world, the mapping between observations and landmarks is unknown

• Picking wrong data associations can have catastrophic consequences

• Pose error correlates data associations

Page 24: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

SLAM● Full SLAM:

● Online SLAM:

Integrations typically done one at a time

),|,( :1:1:1 ttt uzmxp

121:1:1:1:1:1 ...),|,(),|,( −∫ ∫ ∫= ttttttt dxdxdxuzmxpuzmxp

Estimates most recent pose and map!

Estimates entire path and map!

Page 25: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Graphical Model of Full SLAM

),|,( :1:1:1 ttt uzmxp

Page 26: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Graphical Model of Online SLAM

121:1:1:1:1:1 ...),|,(),|,( −∫ ∫ ∫= ttttttt dxdxdxuzmxpuzmxp

Page 27: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Scan Matching

{ })ˆ,|( )ˆ ,|( maxargˆ 11]1[

−−− ⋅= tttt

ttx

t xuxpmxzpxt

robot motioncurrent measurement

map constructed so far

● Maximize the likelihood of the i-th pose and map relative to the (i-1)-th pose and map

● Calculate the map according to “mapping” with known poses based on the poses and observations

Page 28: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

SLAM approach

Page 29: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

PTAM approach

Page 30: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Tracking & Mapping threads

Page 31: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Mapping thread

Page 32: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Stereo Initialization

● 5 point-pose algorithm (Stewenius et al '06)● Requires a pair of frames and feature

correspondences● Provides initial (sparse) 3D point cloud

Page 33: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Wait for new keyframe

● Keyframes are only added if:● There is a baseline to the other keyframes● Tracking quality is good

● When a keyframe is added:● The mapping thread stops whatever it is doing● All points in the map are measured in the

keyframe● New map points are found and added to the map

Page 34: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Add new map points

● Want as many map points as possible● Check all maximal FAST corners in the

keyframe:● Check Shi-Tomasi score● Check if already in map

● Epipolar search in a neighboring keyframe● Triangulate matches and add to map● Repeat in four image pyramid levels

Page 35: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Optimize map

● Use batch SFM method: Bundle Adjustment*● Adjusts map point positions and keyframe

poses● Minimizes re-projection error of all points in

all keyframes (or use only last N keyframes)● Cubic complexity with keyframes, linear with

map points● Compatible with M-estimators (we use Tukey)

Page 36: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Map maintenance

● When camera is not exploring, mapping thread has idle time – use this to improve the map

● Data association in bundle adjustment is reversible

● Re-attempt outlier measurements● Try to measure new map features in all old

keyframes

Page 37: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Tracking thread

Page 38: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Pre-process frame

● Make mono and RGB version of image● Make 4 pyramid levels● Detect FAST corners

Page 39: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Project Points

● Use motion model to update camera pose● Project all map points into image to see which

are visible, and at what pyramid level● Choose subset to measure

● ~50 biggest features for coarse stage● 1000 randomly selected for fine stage

Page 40: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Measure Points

● Generate 8x8 matching template (warped from source keyframe)

● Search a fixed radius around projected position● Use zero-mean SSD● Only search at FAST corner points

● Up to 10 inverse composition iterations for subpixel position (for some patches)

● Typically find 60-70% of patches

Page 41: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Update camera pose

● 6-DOF problem● 10 iterations● Tukey M-Estimator to minimize a robust

objective function of re-projection error

where ej is the re-projection error vector

Page 42: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Bundle-adjustment● Global bundle-adjustment

● Local bundle-adjustment

● X - The newest 5 keyframes in the keyframe chain● Z - All of the map points visible in any of these keyframes● Y - Keyframe for which a measurement of any point in Z has been

made That is, local bundle

● Optimizes the pose of the most recent keyframe and its closest neighbors, and all of the map points seen by these, using all of the measurements ever made of these points.

Page 43: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Video

http://www.youtube.com/watch?v=Y9HMn6bd-v8

http://www.youtube.com/watch?v=pBI5HwitBX4

Page 44: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Capabilities

Page 45: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Capabilities

Multi-scale Compactly Supported Basis FunctionsBundle adjusted point cloud with PTAM

Page 46: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Video

http://www.youtube.com/watch?v=CZiSK7OMANw

Page 47: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

RGB-D Sensor● Principle: structured light

● IR projector + IR camera● RGB camera

● Dense depth images

Page 48: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Kinect-based mapping

Page 49: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

System Overview● Frame-to-frame alignment● Global optimization (SBA for loop closure)

Page 50: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Feature matching

Page 51: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

RANSAC● Features correspondences are established; outliers

robustly removed● Homography (Transformation) between the two

keyframes can now be estimated

Page 52: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Global Optimization (RGBD-ICP)

Page 53: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Benefits● Visual and depth information used jointly for real-time mapping

application

● Reconstruct a dense map of the environment● Avoid dense stereo for every pair of KeyFrames

● Optimize over sparse set of feature points● Results in dramatic speed improvements● Allows for computing other valuable algorithms simultaneously (e.g.

navigation, obstacle avoidance, scene understanding)

Page 54: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Video

http://www.cs.washington.edu/ai/Mobile_Robotics/projects/rgbd-3d-mapping/

Page 55: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Kinect + Real-time reconstruction

Page 56: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Video

http://research.microsoft.com/apps/video/dl.aspx?id=152815

Page 57: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Conclusion● So much information available from a single

camera● Yet to truly understand what we can infer from a

single camera● Several exciting technologies in the recent past● Software problem; not a hardware limitation

● Monocular vision can be sufficient for a lot of use cases

Page 58: From 2D to 3D: Monocular Vision Global bundle-adjustment Local bundle-adjustment X - The newest 5 keyframes in the keyframe chain Z - All of the map points visible in …

Thanks!