Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
AIMS Computer visionLecture 4.1: ReconstructionHT 2017Andrea Vedaldi
For slides and up-to-date information:http://www.robots.ox.ac.uk/~vedaldi/teach.html
AIMS Computer Vision
1. Matching, indexing, and search
2. Object category detection
3. Visual geometry 1/2: Camera models and triangulation
4. Visual geometry 2/2: Reconstruction from multiple views
5. Segmentation, tracking, and depth sensors
2 / 56
Outline
Introduction
Computing H or F from point matches
Feature detection and matching
RANSAC
Determining the ego-motion from F
Structure and motion from more than two views
3 / 56
Outline
Introduction
Computing H or F from point matches
Feature detection and matching
RANSAC
Determining the ego-motion from F
Structure and motion from more than two views
4 / 56
Introduction
In the previous lectures we have seen stereo reconstruction from two views:
1. Obtain (somehow) the camera parameters P = K[I|0] and P ′ = K ′[R|t].
2. Compute the fundamental matrix F = K ′−>[t]×RK−1
3. Match points x in an image to corresponding points x ′ in the second alongthe epipolar lines l ′ = Fx.
4. Triangulation: compute the 3D points X from x, x ′, P, P ′.
Next – What happens if:
1. you do not know how the camera parameters?
2. you have more than two images?
5 / 56
Introduction
In the previous lectures we have seen stereo reconstruction from two views:
1. Obtain (somehow) the camera parameters P = K[I|0] and P ′ = K ′[R|t].
2. Compute the fundamental matrix F = K ′−>[t]×RK−1
3. Match points x in an image to corresponding points x ′ in the second alongthe epipolar lines l ′ = Fx.
4. Triangulation: compute the 3D points X from x, x ′, P, P ′.
Next – What happens if:
1. you do not know how the camera parameters?
2. you have more than two images?
5 / 56
You get Structure from Motion
[Carl Olsson]6 / 56
The Structure from Motion (SFM) problem
Given two or more images of a scene:
Camera C Camera C ′
compute (i) the camera motion and (ii) the scene structure.
Assumptions:
I Known intrinsic calibration K, K ′.
I Unknown extrinsic calibration R, t (egomotion).7 / 56
Prototypical SFM pipeline
1. Match corner points to find point correspondence. This is harder thanbefore as the epipolar geometry is unavailable.
2. Compute the egomotion R, t:
I For planar scenes:
I Compute the homography matrix H (e.g. four points algorithmseen in B14);
I Extract the egomotion from H.
I For general 3D scenes:
I Compute the fundamental matrix F (e.g. eight points algorithm);
I Extract the egomotion from F.
3. Triangulate as before to obtain the 3D points.
8 / 56
Outline
Introduction
Computing H or F from point matches
Feature detection and matching
RANSAC
Determining the ego-motion from F
Structure and motion from more than two views
9 / 56
Corner Points computed for each frame
Start from two views:
10 / 56
Corner Points computed for each frame
Extract some corner points, for example using the Harris detector:
11 / 56
Egomotion from corner points
Egomotion = transformation between the cameras.
C'
x'
C
x ⇒
C'
x'
C
x
(R,t)
X
I Given point correspondences xi ↔ x ′i for i = 1 . . . n, we want to determineR and t.
I Intuition: Keep C still, and move C ′ until all rays intersect.
I Obviously three correspondences are not enough to fix C ′. How many dowe need?
12 / 56
Outline of egomotion computation
1. Compute the fundamental matrix F from the correspondences xi ↔ x ′i .
2. Decompose F = K ′−>[t]×RK−1 to find R, t (given the known K and K ′).
3. Compute the projection matrices P and P ′ if needed.
13 / 56
Actually, t found only up to scale
I F is a homogeneous matrix, so
F ∝ K ′−1[t]×RK−1 ∝ K ′−1
[λt]×RK−1
I Therefore translation and all lengths are recovered only up to scale:
t ≡ λt, Xi ≡ λXi .
Depth/scale ambiguity
We cannot distinguish:
I a large translation when viewing a large distant scene; from
I a small translation when viewing a small near-to scene.
Question: How might you resolve the depth/scale scaling ambiguity?
14 / 56
How many correspondences are required?
I Because of the depth/speed scaling ambiguity the rotation (3 DoF) can bedetermined completely but only the translation direction (2 DoF) isrecoverable.
I This allows us to evaluate the number of correspondence needed:
1. For n scene points there are 3n unknowns
2. Between 2 views there are 5 = (3 rot + 2 trans) unknowns
3. Each correspondence yields 4 measurements
4. Hence 4n > 3n + 5 and
n > 5 correspondences are needed
I For n < 7 the solutions are non-linear, so we’ll see solutions for n = 7 andn = 8.
15 / 56
Computing the fundamental matrix for n > 8
I Task: Given n correspondences xi ↔ x ′i compute F such that
∀i : x ′i>Fxi = 0.
I Solution: Each correspondence generates one constraint
[x ′i y ′i 1
]
f1 f2 f3f4 f5 f6f7 f8 f9
xi
yi
1
= 0
which can be written as
x ′i xi f1 + x ′i yi f2 + x ′i f3 + y ′i xi f4 + y ′i yi f5 + y ′i f6 + xi f7 + yi f8 + f9 = 0
or
[x ′i xi x ′i yi x ′i y ′i xi y ′i yi y ′i xi yi 1
]
f1...f9
= 0.
16 / 56
Computing the fundamental matrix /ctd
I For n correspondences build up the n × 9 system
An×9f =
x ′1x1 x ′1y1 x ′1 y ′1x1 y ′1y1 y ′1 x1 y1 1...
......
......
......
......
x ′nxn x ′nyn x ′n y ′nxn y ′nyn y ′n xn yn 1
f1...f9
.
I For n = 8 points f cand be found as the null-space of A, and so f and Fare determined up to scale (as expected).
I Since the points are noisy, in general one wants to use n > 8. This can bedone using least square.
17 / 56
A least squares version of the 8-point algorithm
Due to noise, there will not be an exact solution to Af = 0 (A has full rank).
Least square formulation
Find the unit vector f that minimizes the norm of the residual r = Af:
f∗ = argminf:‖f‖=1 ‖Af‖2
Solution with eigenvalues
Compute the eigen-decomposition of the matrix M = A>A and set f to the(unit) eigenvector e1 corresponding to the smallest eigenvalue λ1.
Solution with SVDCompute the SVD of the matrix A and set f to the (unit) right singular vectore1 corresponding to the smallest singular value σ1.
18 / 56
A least squares version of the 8-point algorithm
Due to noise, there will not be an exact solution to Af = 0 (A has full rank).
Least square formulation
Find the unit vector f that minimizes the norm of the residual r = Af:
f∗ = argminf:‖f‖=1 ‖Af‖2
Solution with eigenvalues
Compute the eigen-decomposition of the matrix M = A>A and set f to the(unit) eigenvector e1 corresponding to the smallest eigenvalue λ1.
Solution with SVDCompute the SVD of the matrix A and set f to the (unit) right singular vectore1 corresponding to the smallest singular value σ1.
18 / 56
A least squares version of the 8-point algorithm
Due to noise, there will not be an exact solution to Af = 0 (A has full rank).
Least square formulation
Find the unit vector f that minimizes the norm of the residual r = Af:
f∗ = argminf:‖f‖=1 ‖Af‖2
Solution with eigenvalues
Compute the eigen-decomposition of the matrix M = A>A and set f to the(unit) eigenvector e1 corresponding to the smallest eigenvalue λ1.
Solution with SVDCompute the SVD of the matrix A and set f to the (unit) right singular vectore1 corresponding to the smallest singular value σ1.
18 / 56
Proof of the eigendecomposition solution
I The squared sum of the residuals r = Af is
‖r‖2 = r>r = f>A>Af = f>Mf
I M = A>A is a n × n symmetric real matrix; hence it can be decomposedas
M = VΛV> = V
λ1
λ2. . .
λn
V> =
n∑
i=1
λi [ei e>i ]
where
I V =[e1 . . . en
]is the orthonormal matrix of eigenvectors
I eigenvalues are non-decreasing: 0 6 λ1 6 λ2 6 . . . 6 λn.
I The eigenvalues are non-negative because:
Mei = eiλi ⇒ e>i Mei = e>i eiλi ⇒ [Aei ]>[Aei ] = λi > 0.
Thenf>Mf = λ1(f>e1)
2 + λ2(f>e2)2 + . . .+ λn(f>en)
2.
I This expression is minimised when f = e1.
19 / 56
Proof of the eigendecomposition solution
I The squared sum of the residuals r = Af is
‖r‖2 = r>r = f>A>Af = f>Mf
I M = A>A is a n × n symmetric real matrix; hence it can be decomposedas
M = VΛV> = V
λ1
λ2. . .
λn
V> =
n∑
i=1
λi [ei e>i ]
where
I V =[e1 . . . en
]is the orthonormal matrix of eigenvectors
I eigenvalues are non-decreasing: 0 6 λ1 6 λ2 6 . . . 6 λn.
I The eigenvalues are non-negative because:
Mei = eiλi ⇒ e>i Mei = e>i eiλi ⇒ [Aei ]>[Aei ] = λi > 0.
Thenf>Mf = λ1(f>e1)
2 + λ2(f>e2)2 + . . .+ λn(f>en)
2.
I This expression is minimised when f = e1.
19 / 56
Proof of the eigendecomposition solution
I The squared sum of the residuals r = Af is
‖r‖2 = r>r = f>A>Af = f>Mf
I M = A>A is a n × n symmetric real matrix; hence it can be decomposedas
M = VΛV> = V
λ1
λ2. . .
λn
V> =
n∑
i=1
λi [ei e>i ]
where
I V =[e1 . . . en
]is the orthonormal matrix of eigenvectors
I eigenvalues are non-decreasing: 0 6 λ1 6 λ2 6 . . . 6 λn.
I The eigenvalues are non-negative because:
Mei = eiλi ⇒ e>i Mei = e>i eiλi ⇒ [Aei ]>[Aei ] = λi > 0.
Thenf>Mf = λ1(f>e1)
2 + λ2(f>e2)2 + . . .+ λn(f>en)
2.
I This expression is minimised when f = e1.19 / 56
Proof of the SVD solution
I Any m × n matrix A where m > n can be decomposed as
Am×n = Um×n
σ1
σ2. . .
σn
n×n
V>n×n
where U is column-orthogonal, V is fully orthogonal, and Σ contains thesingular values ordered so 0 6 σ1 6 σ2 6 . . . 6 σn.
I The singular vectors V of A are the same as the eigenvectors ofM = A>A:
M = A>A = VΣ>U>UΣV> = VΣ2V>
I In particular f = e1 is the first column of V.
I The SVD is usually preferred to the eigenvalue decomposition because itis numerically more stable.
20 / 56
Proof of the SVD solution
I Any m × n matrix A where m > n can be decomposed as
Am×n = Um×n
σ1
σ2. . .
σn
n×n
V>n×n
where U is column-orthogonal, V is fully orthogonal, and Σ contains thesingular values ordered so 0 6 σ1 6 σ2 6 . . . 6 σn.
I The singular vectors V of A are the same as the eigenvectors ofM = A>A:
M = A>A = VΣ>U>UΣV> = VΣ2V>
I In particular f = e1 is the first column of V.
I The SVD is usually preferred to the eigenvalue decomposition because itis numerically more stable.
20 / 56
Computing F from 7 points
I For the 7× 9 set of equations Af = 0 we know that f is in the null space ofA
I This null space is 2-dimensional and hence spanned by two vectors f1 andf2. Since f is determined up to scale, all solutions are given by:
f = αf1 + (1 − α)f2
I Reshaping the vectors, results in a family of candidate fundamentalmatrices
F = αF1 + (1 − α)F2
I To find which one is a “proper” fundamental matrix, use the non-linearconstraint det F = 0. This gives a cubic equation in α. Can you see why?
I The cubic has either one or three real solutions for α.
21 / 56
A Visual Compass
If the motion of the camera is known to be a pure rotation, then the imagesare related by an homography
x ′ = H∞x
whereH∞ = K ′RK−1.
Algorihtm:
I Find correspondences xi ↔ x ′i
I Compute H∞ from the correspondences (see B14)
I Extract R to find relative rotations
But of course we cannot recover any scene structure!
22 / 56
A Visual Compass
Use H∞ to register images to a common reference frame to create panoramicmosaic:
��
����
��������
����
������
����
����
������
23 / 56
Outline
Introduction
Computing H or F from point matches
Feature detection and matching
RANSAC
Determining the ego-motion from F
Structure and motion from more than two views
24 / 56
Feature detection, matching and the F matrix
So far, we have not discussed matching. The reason is that computation ofthe fundamental matrix can be incorporated into the matching.
Outline:
I Extract image points as corners. Why corners?
I Obtain an initial corner matches using local descriptors.
I Remove outlier and estimate the fundamental matrix F using RANSAC.
I Obtain further corner matches using F.
25 / 56
Why corner points, especially?
I Why not use lines, or take a dense pixel-based approach?
I The key reason is that the search for matches is no longer 1D when thecamera motion is unknown. A 2D region has to be searched.
I A dense approach is then likely to be too expensive, and matchingsections of a line suffers the the aperture problem.
I Corners are:
I relatively sparse;
I reasonably cheap to compute;
I well-localized;
I appear quite robustly from frame to frame.
I Hence corners are good for matching.
26 / 56
Corner Points computed for each frame
Recall that points with distinctively high autocorrelation provide the bestchance of deriving a distinctive cross-correlation signal.
2D
1D
Uniform
27 / 56
Initial matching
I Extract corners in both images (feature detection).
I For each corner x in C, make a list of potential matches x ′ in a region in C ′
around x (heuristic).
I Rank the matches by comparing the regions around the corners usingcross-correlation.
I Sift them to reconcile forward-backward inconsistencies.
I The idea here is to not to do too much work — just enough to get somegood matches.
28 / 56
Initial matching /ctd
. . .Source Patch Target (a) Target (b) Target (c) etc ...
29 / 56
Initial Matching /ctd
Matches — some good matches, some mismatches. Can still compute F witharound 50% mismatches. How?
30 / 56
Outline
Introduction
Computing H or F from point matches
Feature detection and matching
RANSAC
Determining the ego-motion from F
Structure and motion from more than two views
31 / 56
RANSAC – RANdom SAmple Concensus
I Suppose you tried to fit a straight line to data containing outliers — pointswhich are not properly described by the assumed probability distribution.
I The usual methods of least squares are hopelessly corrupted.
I Need to detect outliers and exclude them.
⇒I Use estimation based on robust statistics RANSAC was the first,
devised by vision researchers, Fischler & Bolles (1981)
32 / 56
RANSAC algorithm for lines
1. For many repeated trials:
1.1 Select a random sample of two points
1.2 Fit a line through them
1.3 Count how many other points are within a threshold distance of the line(inliers)
2. Select the line with the largest number of inliers
3. Refine the line by fitting it to all the inliers (using least squares)
Remarks:
I Sample a minimal set of points for your problem (2 for lines).
I Repeat such that there is a high chance that at least one minimal setcontains only inliers (see tutorial sheet).
33 / 56
RANSAC algorithm for lines
Data ∼ 50% corrupt Random Sample
34 / 56
RANSAC algorithm for lines
Support ∼ 10 Support ∼ 50
35 / 56
RANSAC algorithm for F
1. For many repeated trials:
1.1 Select a random sample of seven correspondences
1.2 Compute F using the cubic method
1.3 Count how many other correspondences are within threshold distanceof the epipolar lines (inliers)
2. Select the F with the largest number of inliers
3. Refine F by fitting it to all the inliers (using the SVD method)
36 / 56
RANSAC algorithm for H
1. For many repeated trials:
1.1 Select a random sample of four correspondences
1.2 Compute H (as in B14)
1.3 Count how many other correspondences are within threshold distanceof the predicted locations (inliers)
2. Select the H with the largest number of inliers
3. Refine H by fitting it to all the inliers, optimizing the reprojection error
minH
∑
(x,x′)∈Inliers
d2(x ′,Hx) + d2(H−1x ′, x)
37 / 56
Correspondences consistent with epipolar geometry
Initial matches Inliers
38 / 56
Epipolar geometry
39 / 56
Outline
Introduction
Computing H or F from point matches
Feature detection and matching
RANSAC
Determining the ego-motion from F
Structure and motion from more than two views
40 / 56
Computing R and t from F
Recall that F = K ′−1[t]×RK−1. We now show how to recover R and t from F(given K and K ′).
1. Compute the essential matrix E = [t]×R = K ′>FK.
2. Compute t as the null vector of E> (i.e.E>t = 0).
I t is determined up to a scaling factor µ
I there are two solutions ±µt
3. Compute R from E
I the algorithm for this step is given later.
I it returns two solutions R1 and R2.
4. Overall, there are four solutions for the projection matrix:
P ′ = K ′[R1|µt] P ′ = K ′[R1|− µt]
P ′ = K ′[R2|µt] P ′ = K ′[R2|− µt]
5. Exclude 3 of these using a visibility test41 / 56
The four solutions
The 3D point is in front of both cameras in only one case.
Visible Visible
C C’
Visible
Invisible
C C’
Visible
CC’
Invisible
Invisible
CC’
Invisible
Note these are “computer vision” cameras, so to be visible a ray must passthrough the image on its way to the optic centre!
42 / 56
Computing R1,2 from the essential matrix ENon-examinable
Recall that E = [t]×R; we now recover R from E. Algorithm:
1. Compute the Singular Value Decomposition (SVD) of E.
U
1 0 00 1 00 0 0
V> ← M
2. Set
W =
0 −1 01 0 00 0 1
3. The two solutions are:
R1 = UWV>, R2 = UW>V>
43 / 56
Outline
Introduction
Computing H or F from point matches
Feature detection and matching
RANSAC
Determining the ego-motion from F
Structure and motion from more than two views
44 / 56
Structure and motion for more than two views
I Why bother?
1. Matching becomes more verifiable, as 3D point estimates are availableto reproject.
2. 3D point estimates improve as further views over a range of angles isobtained
3. There is no increase in the degree of ambiguity, though the overallscale ambiguity persists.
45 / 56
Notation for three+ views
I For three views let the cameras be C, C ′, C ′′ with projection matrices P, P ′
and P ′′, and with image points x, x ′ and x ′′.
C C’
xx’
x’’X
C’
I For m views, a point xj is imaged in the i-th camera Ci at xij = PiXj
46 / 56
Point correspondence over 3 views
I Given the projection matrices and x↔ x ′ how is the point x ′′ found?
C C’
xx’
X
C’
?
?
??
I Algorithm:
1. Compute the 3D point from x and x ′
2. Then re-project using P ′′.
I The search in the third image is zero-D, and the size of the search regiondepends only on uncertainty.
47 / 56
Problem statement: structure and motion
I Given: n matching image points xij over m views
I Find: the cameras Pi and the 3D points Xj such that xij ≈ PiXj by finding:
minPi ,Xj
n∑
j=1
m∑
i=1
d2(xij ,P
iXj)
I This is a serious minimization:I For each camera, 6 parametersI For each 3D point, 3
parametersI Total of 6m + 3n − 1 (−1 for
scale) parameters overall
C C’
xx’
X
C’x’’
I For 50 frames, 1000 points, we have 3.3× 103 unknowns!
48 / 56
/ctd
Building block is computing correspondences xij ↔ xi+1
j , finding Fi+1i and
then matrices Pi ,Pi+1
Algorithm1. Compute interest points in each image2. Compute matches between consecutive image pairs
i, i + 13. Compute Fi+1
i . Recover Pi ,Pi+1
4. Compute scene points5. Extend correspondences over image triples6. Extend correspondences over all images7. Optimize over all Pi ,Xj
Images
xj
xj
xj
x1j
1
2
3
4
2
3
4
P
P
P
P
1
2
3
4
Xj
49 / 56
2d3’s Boujou systemZisserman, Fitzgibbon, Torr, Beardsley
Original Sequence
50 / 56
2d3’s Boujou systemZisserman, Fitzgibbon, Torr, Beardsley
Augmentation
51 / 56
Batch SFM
I Up to now batch, offline processing of video sequences
DATADATA
DATA
I Post-production, 3D model reconstruction, etc.
52 / 56
Real-time, sequential SFM
I Real-time, sequential, fixed time budget (10s of milliseconds)
I Build and maintain a map, and localise w.r.t. the map
DATA
+
DATA
+
I Real-time robotics applications, but in simplified 2D environments,specialised sensors, etc
I Reliable, repeated measurement is crucial – mitigates against drift givingrepeatable accuracy.
53 / 56
Sequential structure from motion: visual SLAM
I Represent joint distribution over camera and feature positions using asingle multi-variate Gaussian.
x =
xv
y1
y2...
, P =
Pxx Pxy1 Pxy2 . . .Py1x Py1y1 Py1y2 . . .Py2x Py2y1 Py2y2 . . .
......
...
I Use Kalman Filter (see C4B Mobile Robotics)
predict→ measure→ update
framework to propagate uncertainty, and fuse measurement data
54 / 56
Example: real-time, sequential structure from motion
state
estimate
measurementsearch
image
patch
predict
update
55 / 56
Davison, Reid, Smith, Williams, Klein, et al.