Vision-based SLAM
Simon LacroixRobotics and AI groupLAAS/CNRS, Toulouse
With contributions from:Anthony Mallet, Il-Kyun Jung,
Thomas Lemaire and Joan Sola
Benefits of vision for SLAM ?
• Cameras : low cost, light and power-saving
• Perceive data – In a volume– Very far– Very precisely
1024 x 1024 pixels60º x 60º FOV
0.06 º pixel resolution
1.0 cm at 10.0 m
• Stereovision– 2 cameras
provide depth
• Images carry a vast amount of information• A vast know-how exists in the computer vision community
• The way humans perceive depth
Stereo image pair Stereo images viewerStereo camera
0. A few words on stereovision
• Very popular in the early 20th century
• Anaglyphs
PolarizationRed/Blue
Principle of stereovision
In 2 dimensions (two linear cameras):
Right camera
b
Right image
Disparityd
)tan()tan( α +=
bd
Left camera
Left imageα
Principle of stereovision
In 3 dimensions (two usual matrix cameras):
1. Establish the geometry of the system (off line)2. Establish matches between the two images, compute the disparity3. On the basis of the matches disparity, compute the 3D coordinates
Geometry of stereovision
x
y
z x
y
z
Ol
Or
P
pl
P1
P2
pr
pr1pr2
Geometry of stereovision
Ql
Q
Qr
x
y
z x
y
z
Ol
Or
P
pl
pr
Geometry of stereovision
x
y
z x
y
z
Ol
Or
R
Epipolar geometry
Epipoles
Epipolar lines
Stereo images rectification
Goal: transform the images so that epipolar lines are parallel
Interest: computational cost reduction of the matching process
Dense pixel-based stereovision
Problem: « For each pixel in the left image, find its correspondant in the right image »
… 3 6 3 7 9 2 8 7 6 8 9 6 4 9 0 9 9 0 …
… 3 5 7 4 9 6 3 9 6 5 8 6 3 0 1 9 7 5 …
Left line
Right line
???
The matches are computed on windows
Several ways to compare windows: “SAD”, “SSD”, “ZNCC”, Hamming distance on census-transformed images…
Dense pixel-based stereovision
Original image Disparity map 3D image
Outline
0. A few words on stereovision
0-bis. Visual odometry
2. Pixels selection3. Pixels tracking
1. Stereovision
Stereovision
4. Motionestimation
Visual odometry principle
QuickTime™ et undécompresseur MPEG-4 vidéo
sont requis pour visionner cette image.
Visual odometry• Fairly good precision (up to 1% on 100m trajectories)
• But:
– Depends on odometry (to track pixels)
– No error model available
Visual odometry• Applied on the Mars Exploration Rovers
50 % slip
Outline
0. A few words on stereovision
0-bis. Visual odometry
1. Stereovision SLAM
What kind of landmarks ?
Interest points = sharp peaks of the autocorrelation function
Harris detector (precise version [Schmidt 98])
Auto-correlation matrix:
Principal curvatures defined by the two eigen values of the matrix
(s: scale of the detection)
€
λ1,λ 2
Landmarks : interest points
• Landmark matching
?
Interest points stability
Interest point repeatability
Interest point similarity : resemblance measure of the two principal curvatures of repeated points
points Detected
points Repeated=tyRepetabili
= 70% (7 repeated points out of 10 detected points)
)',max(
)',min()',(
)',max(
)',min()',(
22
222
11
111 λλ
λλ=
λλλλ
= xxSxxS pp
Maximum point similarity: 1
Interest points stability
Repeatability and point similarity evaluation:
Evaluated with known artificial rotation and scale changes
Interest points matching
Principle: combine signal and geometric information to match groups of points [Jung ICCV 01]
Consecutive images Large viewpoint change Small overlap
Landmark matching results
1.5 scale change 3.0 scale change
Landmark matching results
Landmark matching results (Ced)
Detected points Matched points An other example
– Landmark detection– Relative observations (measures)
• Of the landmark positions• Of the robot motions
– Observation associations– Refinement of the landmark and robot positions
Vision : interest points
StereovisionVisual motion estimation
Interest points matching Extended Kalman filter
Stereovision SLAM
Dense stereovision actually not required
IP matching applied on stereo frames (even easier !)
Dense stereovision actually not required
IP matching applied on stereo frames (even easier !)
Visual motion estimation
1. Stereovision
2. Interest point detection
3. Interest points matching
4. Stereovision
5. Motionestimation
– Landmark detection– Relative observations (measures)
• Of the landmark positions• Of the robot motions
– Observation associations– Refinement of the landmark and robot positions
Vision : interest points OK
Stereovision OKVisual motion estimation OK
Interest points matching OK Extended Kalman filter
Stereovision SLAM
Seting up the Kalman filter
€
x(k +1) = f (x(k),u(k +1)) + v(k +1), v with covariance Pv (k)
€
z(k) = h(x(k)) + w(k), w with covariance Pw (k)
€
u(k +1) = (Δφ,Δθ,Δψ ,Δtx,Δty,Δtz )
€
υ i(k +1) = zi(k +1) − ˆ z i(k +1/k)
• System state:
• System equation:
• Observation equation:
• Prediction: motion estimates
• Landmark “discovery”: stereovision
• Observation : matching + stereovision
€
x(k) = [x p,m1,...,mN ], avec x p = [φ,θ,ψ , tx, ty, tz] et mi = [x i, y i,zi]
P(k) =Ppp (k) Ppm (k)
Ppm (k) Pmm (k)
⎡
⎣ ⎢
⎤
⎦ ⎥
€
mi = [x i, y i,zi]
Need to estimate the errors
Error estimates (1)
• Errors on the disparity estimates
)( cfd =σempirical study:
s
• Errors on the 3D coordinates :
Maximal errors : 0.4 m baseline: 2310 xx−≤σ
1.2 m baseline: 2410.3 xx
−≤σ
Online estimation of the errors
2xd
x dx α
σσα=⇒=
Stereovision error:
Error estimates (2)• Interest point matching error (not miss-matching)
- Correlation surface built thanks to rotation and scale adaptive correlation,fitted with a Gaussian distribution
Gaussian distribution Correlation surface
• Combination of matching and stereo error - Driven by 8 neighbor 3D points and projecting one sigma covariance ellipse to 3D surface
1 pixel
X0Xk
wk
))(( 220
20
8
1
22
8
10 kk
kkX XXw σ+σ+−=σ ∑
=
220 kσσ , : variance of stereo vision error
Error estimates (3)
Visual motion estimation error
• Propagating the uncertainty of 3D matching points set to optimal motion estimate
- 3D matching points set
- Optimal motion estimate
- Cost function
€
J( ˆ u , ˆ Q ) = (X 'n −R( ˆ Θ , ˆ Φ , ˆ Ψ )Xn − ˆ t T )2
n=1
N
∑€
ˆ Q = Q + ΔQ = [X1,..., XN , X '1 ,..., X 'N ]
),̂,̂,̂ˆ,ˆ,ˆ(ˆzyxuuu tttΨΦΘ=Δ+=
• Covariance of the random perturbation Δu : propagation using Taylor series expansion of the Jacobian of the cost function around Qu ˆ,ˆ
QuickTime™ et undécompresseur
sont requis pour visionner cette image.
Results
70m loop, altitude from 25 to 30m, 90 stereo pair processed
Landmark error ellipses (x40)
Tra
ject
ory
and
land
mar
ksP
ositi
on a
nd
attit
ude
varia
nces
Results
70m loop, altitude from 25 to 30m, 90 stereo pair processed
Frame 1/90
Reference
Reference
Std. Dev.
VME
result
VME
Abs.error
SLAM
result
SLAM
Std. Dev.
SLAM
Abs. error
Θ 6.19° 0.18° 11.93° 5.74° 6.01° 0.16° 0.18°
Φ 2.31° 0.66° 4.00° 1.69° 1.42° 0.55° 0.89°
-105.94° 0.06° -105.52° 0.41° -106.03° 0.08° 0.09°
tx 3.17m 0.26m 5.31m 2.14m 3.13m 0.09m 0.04m
ty 0.61m 0.07m 2.01m 1.40m 0.26m 0.19m 0.35m
tz -1.52m 0.04m -3.25m 1.73m -1.51m 0.03m 0.01m
Results (Ced)
270m loop, altitude from 25 to 30m, 400 stereo pairs processed, 350 landmark mapped
Landmark error ellipses (x30)
Tra
ject
ory
and
land
mar
ksP
ositi
on a
nd
attit
ude
varia
nces
Results (Ced)
270m loop, altitude from 25 to 30m, 400 stereo pairs processed, 350 landmark mapped
Frame 1/400
Reference
Reference
Std. Dev.
VME
result
VME
Abs.error
SLAM
result
SLAM
Std. Dev.
SLAM
Abs. error
Θ -0.12° 0.87° -0.13° 0.01° -3.68° 0.38° 3.56°
Φ 2.87° 1.14° -4.99° 7.86° 5.54° 0.40° 1.64°
105.44° 0.23° 101.82° 3.62° 104.32° 0.19° 1.12°
tx -4.73m 0.57m 5.45m10.38
m-3.98m
0.21m
0.95m
ty 0.14m 0.46m 3.04m 2.90m -2.16m0.22
m2.12m
tz 3.89m 0.15m 19.81m15.94
m3.46m
0.11m
0.43m
Application to ground rovers
landmark uncertainty ellipses (x5)
• 110 stereo pairs processed, 60m loop
Application to ground rovers
Frame 1/100
Reference
Reference
Std. Dev.
VME
result
VME
Abs.error
SLAM
result
SLAM
Std. Dev.
SLAM
Abs. error
Θ 0.52° 0.31° 2.75° 2.23° 0.88° 0.98° 0.36 °
Φ 0.36° 0.25° -0.11° 0.47° 0.72° 0.74° 0.36 °
-0.14° 0.16° 1.89° 2.03° 1.24° 1.84° 1.38°
tx -0.012m
0.010m
0.057m0.069
m-
0.077m0.069
m0.065
m
ty -0.243m
0.019m
-1.018m0.775
m-
0.284m0.064
m0.041
m
tz 0.019m0.015
m0.144m
0.125m
0.018m0.019
m0.001
m
• 110 stereo pairs processed, 60m loop
Application to indoor robots About 30 m long trajectory, 1300 stereo image pairs
… … …
Application to indoor robots
10 timesCov. ellipse
About 30 m long trajectory, 1300 stereo image pairs
Application to indoor robots About 30 m long trajectory, 1300 stereo image pairs
10 timesCov. ellipse
Beginning of loopMiddle of loop
End of loop
Application to indoor robots
Phi Theta Elevation
-Two rotation angles(Phi, Theta) and Elevation must be zero
CameraPhi
Theta
Elevation
About 30 m long trajectory, 1300 stereo image pairs
Outline
0. A few words on stereovision
0-bis. Visual odometry
1. Stereovision SLAM
2. Monocular (bearing-only) SLAM
Bearing-only SLAM
Generic SLAM
– Landmark detection– Relative observations (measures)
• Of the landmark positions• Of the robot motions
– Observation associations– Refinement of the landmark and robot positions
Stereovision SLAM
Vision : interest points
Stereovision Visual motion estimation
Interest points matching Extended Kalman filter
Bearing-only SLAM
Generic SLAM
– Landmark detection– Relative observations (measures)
• Of the landmark positions• Of the robot motions
– Observation associations– Refinement of the landmark and robot positions
Monocular SLAM
Vision : interest points
« Multi-view stereovision » INS, Motion model, GPS…
Interest points matching Particle filter + extended Kalman filter
« Observation filter » ≈ Gaussian particles
1. Landmark initialisation
2. Landmark observations
Landmark observations
Bearing-only SLAM
Overview of the whole algorithm
Bearing-only SLAM
Comparison stereo / bearing-only
Mapped landmarks (bearing-only case)
Looking forward / looking sidewards
stereovision bearing-only
Using panoramic vision
Data association is still an issue
« View-based » qualitative navigation can help to focus the search
View-based navigation
Indexing with global attributes
Local characteristics histograms based on gaussian derivatives Color Histograms
Texture histograms
Local Characteristics Histograms Family
(LCHF)
View-based navigation
Empirical relation between image distance and cartesian distance
Closing the loop
1. Image processing at each image acquisition
Closing the loop2. SLAM processes at each image acquisition
Closing the loop
QuickTime™ et undécompresseur Cinepak
sont requis pour visionner cette image.
Outline
0. A few words on stereovision
0-bis. Visual odometry
1. Stereovision SLAM
2. Monocular (bearing-only) SLAM
3. Bearing-only SLAM using line segments
Using line segments
Initializing line segments landmarks
Line segment representation: Plücker coordinates
In 2 dimensions:
Bearing-only SLAM with line segments
QuickTime™ et undécompresseur
sont requis pour visionner cette image.
Bearing-only SLAM with line segments
Summary
0. A few words on stereovision
0-bis. Visual odometry
1. Stereovision SLAM
2. Monocular (bearing-only) SLAM
3. Bearing-only SLAM using line segments