Download ppt - Vision-based SLAM

Transcript
Page 1: Vision-based SLAM

Vision-based SLAM

Simon LacroixRobotics and AI groupLAAS/CNRS, Toulouse

With contributions from:Anthony Mallet, Il-Kyun Jung,

Thomas Lemaire and Joan Sola

Page 2: Vision-based SLAM

Benefits of vision for SLAM ?

• Cameras : low cost, light and power-saving

• Perceive data – In a volume– Very far– Very precisely

1024 x 1024 pixels60º x 60º FOV

0.06 º pixel resolution

1.0 cm at 10.0 m

• Stereovision– 2 cameras

provide depth

• Images carry a vast amount of information• A vast know-how exists in the computer vision community

Page 3: Vision-based SLAM

• The way humans perceive depth

Stereo image pair Stereo images viewerStereo camera

0. A few words on stereovision

• Very popular in the early 20th century

• Anaglyphs

PolarizationRed/Blue

Page 4: Vision-based SLAM

Principle of stereovision

In 2 dimensions (two linear cameras):

Right camera

b

Right image

Disparityd

)tan()tan( α +=

bd

Left camera

Left imageα

Page 5: Vision-based SLAM

Principle of stereovision

In 3 dimensions (two usual matrix cameras):

1. Establish the geometry of the system (off line)2. Establish matches between the two images, compute the disparity3. On the basis of the matches disparity, compute the 3D coordinates

Page 6: Vision-based SLAM

Geometry of stereovision

x

y

z x

y

z

Ol

Or

P

pl

P1

P2

pr

pr1pr2

Page 7: Vision-based SLAM

Geometry of stereovision

Ql

Q

Qr

x

y

z x

y

z

Ol

Or

P

pl

pr

Page 8: Vision-based SLAM

Geometry of stereovision

x

y

z x

y

z

Ol

Or

R

Epipolar geometry

Epipoles

Epipolar lines

Page 9: Vision-based SLAM

Stereo images rectification

Goal: transform the images so that epipolar lines are parallel

Interest: computational cost reduction of the matching process

Page 10: Vision-based SLAM

Dense pixel-based stereovision

Problem:  « For each pixel in the left image, find its correspondant in the right image »

… 3 6 3 7 9 2 8 7 6 8 9 6 4 9 0 9 9 0 …

… 3 5 7 4 9 6 3 9 6 5 8 6 3 0 1 9 7 5 …

Left line

Right line

???

The matches are computed on windows

Several ways to compare windows: “SAD”, “SSD”, “ZNCC”, Hamming distance on census-transformed images…

Page 11: Vision-based SLAM

Dense pixel-based stereovision

Original image Disparity map 3D image

Page 12: Vision-based SLAM

Outline

0. A few words on stereovision

0-bis. Visual odometry

Page 13: Vision-based SLAM

2. Pixels selection3. Pixels tracking

1. Stereovision

Stereovision

4. Motionestimation

Visual odometry principle

Page 14: Vision-based SLAM

QuickTime™ et undécompresseur MPEG-4 vidéo

sont requis pour visionner cette image.

Page 15: Vision-based SLAM

Visual odometry• Fairly good precision (up to 1% on 100m trajectories)

• But:

– Depends on odometry (to track pixels)

– No error model available

Page 16: Vision-based SLAM

Visual odometry• Applied on the Mars Exploration Rovers

50 % slip

Page 17: Vision-based SLAM

Outline

0. A few words on stereovision

0-bis. Visual odometry

1. Stereovision SLAM

Page 18: Vision-based SLAM

What kind of landmarks ?

Interest points = sharp peaks of the autocorrelation function

Harris detector (precise version [Schmidt 98])

Auto-correlation matrix:

Principal curvatures defined by the two eigen values of the matrix

(s: scale of the detection)

λ1,λ 2

Page 19: Vision-based SLAM

Landmarks : interest points

• Landmark matching

?

Page 20: Vision-based SLAM

Interest points stability

Interest point repeatability

Interest point similarity : resemblance measure of the two principal curvatures of repeated points

points Detected

points Repeated=tyRepetabili

= 70% (7 repeated points out of 10 detected points)

)',max(

)',min()',(

)',max(

)',min()',(

22

222

11

111 λλ

λλ=

λλλλ

= xxSxxS pp

Maximum point similarity: 1

Page 21: Vision-based SLAM

Interest points stability

Repeatability and point similarity evaluation:

Evaluated with known artificial rotation and scale changes

Page 22: Vision-based SLAM

Interest points matching

Principle: combine signal and geometric information to match groups of points [Jung ICCV 01]

Page 23: Vision-based SLAM

Consecutive images Large viewpoint change Small overlap

Landmark matching results

Page 24: Vision-based SLAM

1.5 scale change 3.0 scale change

Landmark matching results

Page 25: Vision-based SLAM

Landmark matching results (Ced)

Detected points Matched points An other example

Page 26: Vision-based SLAM

– Landmark detection– Relative observations (measures)

• Of the landmark positions• Of the robot motions

– Observation associations– Refinement of the landmark and robot positions

Vision : interest points

StereovisionVisual motion estimation

Interest points matching Extended Kalman filter

Stereovision SLAM

Page 27: Vision-based SLAM

Dense stereovision actually not required

IP matching applied on stereo frames (even easier !)

Page 28: Vision-based SLAM

Dense stereovision actually not required

IP matching applied on stereo frames (even easier !)

Page 29: Vision-based SLAM

Visual motion estimation

1. Stereovision

2. Interest point detection

3. Interest points matching

4. Stereovision

5. Motionestimation

Page 30: Vision-based SLAM

– Landmark detection– Relative observations (measures)

• Of the landmark positions• Of the robot motions

– Observation associations– Refinement of the landmark and robot positions

Vision : interest points OK

Stereovision OKVisual motion estimation OK

Interest points matching OK Extended Kalman filter

Stereovision SLAM

Page 31: Vision-based SLAM

Seting up the Kalman filter

x(k +1) = f (x(k),u(k +1)) + v(k +1), v with covariance Pv (k)

z(k) = h(x(k)) + w(k), w with covariance Pw (k)

u(k +1) = (Δφ,Δθ,Δψ ,Δtx,Δty,Δtz )

υ i(k +1) = zi(k +1) − ˆ z i(k +1/k)

• System state:

• System equation:

• Observation equation:

• Prediction: motion estimates

• Landmark “discovery”: stereovision

• Observation : matching + stereovision

x(k) = [x p,m1,...,mN ], avec x p = [φ,θ,ψ , tx, ty, tz] et mi = [x i, y i,zi]

P(k) =Ppp (k) Ppm (k)

Ppm (k) Pmm (k)

⎣ ⎢

⎦ ⎥

mi = [x i, y i,zi]

Need to estimate the errors

Page 32: Vision-based SLAM

Error estimates (1)

• Errors on the disparity estimates

)( cfd =σempirical study:

s

• Errors on the 3D coordinates :

Maximal errors : 0.4 m baseline: 2310 xx−≤σ

1.2 m baseline: 2410.3 xx

−≤σ

Online estimation of the errors

2xd

x dx α

σσα=⇒=

Stereovision error:

Page 33: Vision-based SLAM

Error estimates (2)• Interest point matching error (not miss-matching)

- Correlation surface built thanks to rotation and scale adaptive correlation,fitted with a Gaussian distribution

Gaussian distribution Correlation surface

• Combination of matching and stereo error - Driven by 8 neighbor 3D points and projecting one sigma covariance ellipse to 3D surface

1 pixel

X0Xk

wk

))(( 220

20

8

1

22

8

10 kk

kkX XXw σ+σ+−=σ ∑

=

220 kσσ , : variance of stereo vision error

Page 34: Vision-based SLAM

Error estimates (3)

Visual motion estimation error

• Propagating the uncertainty of 3D matching points set to optimal motion estimate

- 3D matching points set

- Optimal motion estimate

- Cost function

J( ˆ u , ˆ Q ) = (X 'n −R( ˆ Θ , ˆ Φ , ˆ Ψ )Xn − ˆ t T )2

n=1

N

∑€

ˆ Q = Q + ΔQ = [X1,..., XN , X '1 ,..., X 'N ]

),̂,̂,̂ˆ,ˆ,ˆ(ˆzyxuuu tttΨΦΘ=Δ+=

• Covariance of the random perturbation Δu : propagation using Taylor series expansion of the Jacobian of the cost function around Qu ˆ,ˆ

Page 35: Vision-based SLAM

QuickTime™ et undécompresseur

sont requis pour visionner cette image.

Page 36: Vision-based SLAM

Results

70m loop, altitude from 25 to 30m, 90 stereo pair processed

Landmark error ellipses (x40)

Tra

ject

ory

and

land

mar

ksP

ositi

on a

nd

attit

ude

varia

nces

Page 37: Vision-based SLAM

Results

70m loop, altitude from 25 to 30m, 90 stereo pair processed

Frame 1/90

Reference

Reference

Std. Dev.

VME

result

VME

Abs.error

SLAM

result

SLAM

Std. Dev.

SLAM

Abs. error

Θ 6.19° 0.18° 11.93° 5.74° 6.01° 0.16° 0.18°

Φ 2.31° 0.66° 4.00° 1.69° 1.42° 0.55° 0.89°

-105.94° 0.06° -105.52° 0.41° -106.03° 0.08° 0.09°

tx 3.17m 0.26m 5.31m 2.14m 3.13m 0.09m 0.04m

ty 0.61m 0.07m 2.01m 1.40m 0.26m 0.19m 0.35m

tz -1.52m 0.04m -3.25m 1.73m -1.51m 0.03m 0.01m

Page 38: Vision-based SLAM

Results (Ced)

270m loop, altitude from 25 to 30m, 400 stereo pairs processed, 350 landmark mapped

Landmark error ellipses (x30)

Tra

ject

ory

and

land

mar

ksP

ositi

on a

nd

attit

ude

varia

nces

Page 39: Vision-based SLAM

Results (Ced)

270m loop, altitude from 25 to 30m, 400 stereo pairs processed, 350 landmark mapped

Frame 1/400

Reference

Reference

Std. Dev.

VME

result

VME

Abs.error

SLAM

result

SLAM

Std. Dev.

SLAM

Abs. error

Θ -0.12° 0.87° -0.13° 0.01° -3.68° 0.38° 3.56°

Φ 2.87° 1.14° -4.99° 7.86° 5.54° 0.40° 1.64°

105.44° 0.23° 101.82° 3.62° 104.32° 0.19° 1.12°

tx -4.73m 0.57m 5.45m10.38

m-3.98m

0.21m

0.95m

ty 0.14m 0.46m 3.04m 2.90m -2.16m0.22

m2.12m

tz 3.89m 0.15m 19.81m15.94

m3.46m

0.11m

0.43m

Page 40: Vision-based SLAM

Application to ground rovers

landmark uncertainty ellipses (x5)

• 110 stereo pairs processed, 60m loop

Page 41: Vision-based SLAM

Application to ground rovers

Frame 1/100

Reference

Reference

Std. Dev.

VME

result

VME

Abs.error

SLAM

result

SLAM

Std. Dev.

SLAM

Abs. error

Θ 0.52° 0.31° 2.75° 2.23° 0.88° 0.98° 0.36 °

Φ 0.36° 0.25° -0.11° 0.47° 0.72° 0.74° 0.36 °

-0.14° 0.16° 1.89° 2.03° 1.24° 1.84° 1.38°

tx -0.012m

0.010m

0.057m0.069

m-

0.077m0.069

m0.065

m

ty -0.243m

0.019m

-1.018m0.775

m-

0.284m0.064

m0.041

m

tz 0.019m0.015

m0.144m

0.125m

0.018m0.019

m0.001

m

• 110 stereo pairs processed, 60m loop

Page 42: Vision-based SLAM

Application to indoor robots About 30 m long trajectory, 1300 stereo image pairs

… … …

Page 43: Vision-based SLAM

Application to indoor robots

10 timesCov. ellipse

About 30 m long trajectory, 1300 stereo image pairs

Page 44: Vision-based SLAM

Application to indoor robots About 30 m long trajectory, 1300 stereo image pairs

10 timesCov. ellipse

Beginning of loopMiddle of loop

End of loop

Page 45: Vision-based SLAM

Application to indoor robots

Phi Theta Elevation

-Two rotation angles(Phi, Theta) and Elevation must be zero

CameraPhi

Theta

Elevation

About 30 m long trajectory, 1300 stereo image pairs

Page 46: Vision-based SLAM

Outline

0. A few words on stereovision

0-bis. Visual odometry

1. Stereovision SLAM

2. Monocular (bearing-only) SLAM

Page 47: Vision-based SLAM

Bearing-only SLAM

Generic SLAM

– Landmark detection– Relative observations (measures)

• Of the landmark positions• Of the robot motions

– Observation associations– Refinement of the landmark and robot positions

Stereovision SLAM

Vision : interest points

Stereovision Visual motion estimation

Interest points matching Extended Kalman filter

Page 48: Vision-based SLAM

Bearing-only SLAM

Generic SLAM

– Landmark detection– Relative observations (measures)

• Of the landmark positions• Of the robot motions

– Observation associations– Refinement of the landmark and robot positions

Monocular SLAM

Vision : interest points

« Multi-view stereovision » INS, Motion model, GPS…

Interest points matching Particle filter + extended Kalman filter

Page 49: Vision-based SLAM

« Observation filter » ≈ Gaussian particles

1. Landmark initialisation

2. Landmark observations

Landmark observations

Page 50: Vision-based SLAM

Bearing-only SLAM

Page 51: Vision-based SLAM

Overview of the whole algorithm

Bearing-only SLAM

Page 52: Vision-based SLAM

Comparison stereo / bearing-only

Mapped landmarks (bearing-only case)

Page 53: Vision-based SLAM

Looking forward / looking sidewards

stereovision bearing-only

Page 54: Vision-based SLAM

Using panoramic vision

Page 55: Vision-based SLAM

Data association is still an issue

« View-based » qualitative navigation can help to focus the search

Page 56: Vision-based SLAM

View-based navigation

Indexing with global attributes

Local characteristics histograms based on gaussian derivatives Color Histograms

Texture histograms

Local Characteristics Histograms Family

(LCHF)

Page 57: Vision-based SLAM

View-based navigation

Empirical relation between image distance and cartesian distance

Page 58: Vision-based SLAM

Closing the loop

1. Image processing at each image acquisition

Page 59: Vision-based SLAM

Closing the loop2. SLAM processes at each image acquisition

Page 60: Vision-based SLAM

Closing the loop

QuickTime™ et undécompresseur Cinepak

sont requis pour visionner cette image.

Page 61: Vision-based SLAM

Outline

0. A few words on stereovision

0-bis. Visual odometry

1. Stereovision SLAM

2. Monocular (bearing-only) SLAM

3. Bearing-only SLAM using line segments

Page 62: Vision-based SLAM

Using line segments

Page 63: Vision-based SLAM

Initializing line segments landmarks

Line segment representation: Plücker coordinates

In 2 dimensions:

Page 64: Vision-based SLAM

Bearing-only SLAM with line segments

QuickTime™ et undécompresseur

sont requis pour visionner cette image.

Page 65: Vision-based SLAM

Bearing-only SLAM with line segments

Page 66: Vision-based SLAM

Summary

0. A few words on stereovision

0-bis. Visual odometry

1. Stereovision SLAM

2. Monocular (bearing-only) SLAM

3. Bearing-only SLAM using line segments


Recommended