Robust monocular SLAM towards motion disturbance

Artic le Information Processing

Robust monocular SLAM towards motion disturbance

Wei Liu • Nanning Zheng • Zejian Yuan •

Pengju Ren • Tao Wang

Received: 19 September 2013 / Accepted: 8 November 2013 / Published online: 4 April 2014

� Science China Press and Springer-Verlag Berlin Heidelberg 2014

Abstract The standard extended Kalman filter-based

simultaneously localization and mapping (EKF-SLAM)

algorithm has a drawback that it could not handle the

sudden motion caused by the motion disturbance. This

prevents the SLAM system from real applications. Many

techniques have been developed to make the system more

robust to the motion disturbance. In this paper, we propose

a robust monocular SLAM algorithm. First, when the

motion model-based system failed to track the features, a

KLT tracker will be activated for each feature. Second, the

KLT tracked features are used to update the camera states.

Third, the difference between the camera states and the

predictions is used to adjust the input motion noise. Finally,

we do the standard EKF-SLAM with the new input motion

noise. In order to make the system more reliable, a joint

compatibility branch and bound algorithm are used to

check the outliers, and an IEKF filter is used to make the

motion estimation smoother when the camera encounters

sudden movement. The experiments are done on an image

sequence caught by a shaking hand-held camera, which

show that the proposed method is very robust to large

motion disturbance.

Keywords Visual SLAM � Motion disturbance �Noise adjustment � KLT tracker

1 Introduction

Estimating a robot’s location in an environment and

simultaneously describing the environment are the basic

work of a real robot system. This is the problem of

simultaneously localization and mapping (SLAM), which

has also attracted much attention. Applications such as

wearable computing (location awareness), augmented

reality, map building, unknown place exploration, etc.,

require big progress in this problem. Many kinds of sensors

such as laser, sonar, inertial instruments, GPS, and cameras

are used to get the information of the environment. But

systems with only cameras are most concerned.

The monocular extended Kalman filter (EKF) SLAM

system introduced by Davison et al. [1] has achieved great

success. A linear motion model is used to predict the sys-

tem changes. But the real movements are always combined

with unpredictable motion disturbance. In order to increase

the tolerance of the system to the motion disturbance, Choi

and Oh [2] inflate the state uncertainty using the magnitude

of the innovation. Lee et al. [3] use the information from

the SLAM to recover the blurred images to get more robust

results. Montemerlo et al. [4] introduce the particle filter to

the SLAM problem, called fast SLAM. It is a combination

of a number of EKF-SLAM with different distributions. It

behaves like a non-optimal local search algorithm and

degenerates with time [5].

Except for the model-based methods, Nister et al. [6]

presented a visual odometry algorithm. In this algorithm,

a KLT tracker [7] is used to match the features between

successive frames. This data association method could go

over the limitation of the motion model-based method,

but wrong data associations are also prone to be intro-

duced due to big and independent search regions.

Therefore, the RANSAC optimization algorithm [8, 9] is

W. Liu � N. Zheng � Z. Yuan � P. Ren (&)

Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong

University, Xi’an 710049, China

e-mail: [email protected]

W. Liu � T. Wang

AVIC Xi’an Flight Automatic Control Research Institute,

Xi’an 710065, China

123

Chin. Sci. Bull. (2014) 59(17):2050–2056 csb.scichina.com

DOI 10.1007/s11434-014-0254-1 www.springer.com/scp

www.springer.com/scp

used to reject the outliers to get better estimations. Singh

et al. [10] use optical flow to help the motion estimation

in a Leg system, and a Hybrid EKF approach is used in

different phases of the moving leg. Inertial sensors [11]

are also used to catch the motions, but it is complicated

by motion disturbance and sensor misalignment.

In this paper, we use the KLT tracker [7] to help the

monocular EKF-SLAM system to overcome the tracking

lost caused by sudden motions, and the flowchart is

shown in Fig. 1. The KLT tracker can work fast without

motion model and is also robust to motion blurs to some

extent. If the feature tracking is failed due to large

motion disturbance, the KLT tracker will be activated.

The tracking results are used to estimate the current

camera’s linear and angle velocities; the differences

between these results and the predictions are used to

adjust the value of the motion noise. Then, we can re-

predict the system states with the new motion noise. In

our former work [12], Horn and Schunck [13] optical

flow method is used in every frame to guide the feature

tracking. The camera states are updated directly by the

optical flow matched features. It could help the system to

track the features in large motion disturbance, but with a

drawback that it may introduce wrong matches that are

not compatible with each other.

Although the KLT tracker is very robust for feature

tracking, unreliable matches may also be introduced due to

feature detection and description methods, or processing

noise, etc. So, a joint compatibility test [14] algorithm is

used to reject the unreliable matches by the joint distri-

butions of all the visible features. Except for this, the

estimation error may increases largely in the situation of

large motion disturbance. This problem could benefit from

an IEKF filter [15] with only a little increase in computa-

tional complicity.

2 Robust monocular EKF-SLAM

2.1 Classical EKF-SLAM

The system states are represented by a state vector x, which

is composed of a group of parameters referring to the

camera state vector xc and n visual features {y1, y2, …, yn},

x ¼ xc; y1; y2; . . .; ynð Þ: ð1Þ

The camera state vector is composed of the camera

position r and its rotation quaternion q, as well as its linear

velocity v and angular velocity x in the world coordinates,

xc ¼ ðr; q; v;xÞ: ð2Þ

The ith feature position is represented by

yi ¼ xi; yi; zið Þ; ð3Þ

in which xi, yi, zi represent for the feature’s 3D position in

the world coordinate. The relationships between the

features and the camera are kept in a covariance matrix

P. In the prediction step, the system states and covariance

are

bxkjk�1 ¼ Fkbxkjk�1 þ nk; ð4Þ

bPkjk�1 ¼ FkbPk�1jk�1FT

k þ Qk: ð5Þ

Fk is system state transformation matrix, and nk is the

processing noise with covariance Qk. The system states

observation function is,

eyk ¼ zk � h bxkjk�1 þ nz

� �

; ð6Þ

where k is the discrete time step, nz is the observation noise with

covariance Qz, h is the system states observation function, H is

the partial derivative of h to x, and zk is the feature matching

results. The system states update functions are

Fig. 1 The process of robust monocular EKF-SLAM toward motion disturbance. The camera states are composed of its 3D location, pose,

linear, and angular velocities. The feature states are the feature’s 3D locations. The system states are both of them

Chin. Sci. Bull. (2014) 59(17):2050–2056 2051

123

bKk ¼ Pkjk�1Hk�1jk�1 HkPkjk�1Hk þ Qz

� ��1; ð7Þ

bxkjk ¼ bxkjk�1 þ Kkeyk; ð8Þ

Pkjk ¼ I � KkHkð ÞPkjk�1: ð9Þ

2.2 Feature detection and matching

The repeatability of the features is of vital importance for

being recognized in different frames. The Harris corner

[16] features are used in our system, which are robust to

moderate image distortions [17] and scale changes. The

shape matrix for Harris corner detection is

M ¼ A C

C B

� �

¼P

Gr � IxIx

P

Gr � IxIyP

Gr � IxIy

P

Gr � IyIy

� �

; ð10Þ

where Ix and Iy denote the horizontal and vertical deriva-

tives of the image, respectively, and Gr is the Gauss

template. The corner response is R = Det - k 9 Tr2, in

which Det = AB - C2 and Tr = A ? B, and the parame-

ter k is normally set to 0.04. But here, we prefer a modi-

fication of the response function, R = Det/(Tr ? eps), eps

represents for the positive minimum number, which is

suggested by Noble [18]. The features that have bigger

responses are recognized as Harris corners. The ones that

have a minimum distance of dmin are selected, because

well-distributed features will produce well position result.

We try to keep the visible features in a certain number. If

the visible features are too few, new features will be

detected and the ones that are well distributed and have

bigger responses will be added.

The features are described with image patches, which

are centered on the features’ positions. When we get a new

frame of observation, the features’ positions and the cor-

responding uncertainties are predicted by a Kalman filter.

Then, the mutual covariance of the selected features and

every patch centered inside the predicted region are cal-

culated. The patch that has biggest mutual covariance is

considered as the matched feature. However, when

encounters large motion disturbance, the potentially mat-

ched features may fall out of the predicted regions. In this

situation, the features will be tracked by KLT, which is not

limited by a motion model and is also robust to motion

blurs.

2.3 Noise adjustment

The EKF-SLAM could perform very well in linear process.

If the camera moves smoothly, the filter can predict the

feature’s locations accurately. However, if the camera

encounters large motion disturbance that is beyond the

default level of the noise, the features may drop out of the

predicted regions. This will make the features unable to

match correctly and eventually make the system lost. In

this situation, the KLT tracker is activated to track the

features. But these matched features could not be directly

added to the system as we used to do in [12], because they

are not compatible with each other, it may potentially

damage the system. In our new approach, the KLT matched

features are used to update the camera states to get an

approximate estimation of the camera speed. The differ-

ences between these results and the predictions are used as

the average value of the motion noise. Then, we will do the

standard step of the EKF-SLAM again with the new

motion noise.

When we do the prediction with the new motion noise,

all the features are predicted again. Then, the KLT matched

features are checked by the new predictions. If their dif-

ference is below a certain threshold, then the KLT results

will be accepted. Otherwise, it will be rejected. All the

unmatched features will be searched in the new predicted

regions again. Then, the system total states are updated.

The Kalman filter update is a time-consuming process.

However, only the information of the matched features is

responsible for the camera state changes. So, in the first

step, the KLT matched features are used to update the

camera states.

if zi 2 zKLT;HKLT ¼ ½ hix; � � � �T; ð11Þ

where hix corresponds to the optical flow matched features,

if zi 2 zKLT and zj 2 zKLT;

PKLT ¼

pxcxcpxcyi

pxcyj. . .

pyixcpyiyi

pyiyj. . .

pyjxcpyjyi

pyjyj. . .

..

. ... ..

. . ..

2

6

6

6

4

3

7

7

7

5

; ð12Þ

the corresponding covariance pyiyjis extracted from the

original covariance matrix. The extracted HKLT and PKLT

are used to calculate the Kalman file gain KKLT, and then

we can get the new camera state vector,

bxkjkc KLT ¼ bxkjk�1

c þ KkKLT � zk

KLT � bzkjk�1� �

: ð13Þ

So, the average value of the processing noise can be

written as

nk ¼ bxkjkc KLT � bxkjk�1

c ¼ KkKLT � zk

KLT � bzkjk�1� �

; ð14Þ

in function (14), only the velocity are assigned to nk. With

the help of the KLT tracker, the system is more robust to

motion disturbance. This strategy also shows robust to the

confusion of the KLT tracker. With more accurate pre-

dicted positions, we could also find out or reject the fea-

tures that are wrong predicted or wrong matched,

especially those ones that are near the edge of the image.

2052 Chin. Sci. Bull. (2014) 59(17):2050–2056

123

2.4 Joint compatibility

As is known that data association is very important for a

SLAM system, wrong association is very harmful to the

filter. Especially, the newly observed features, whose 3D

position uncertainties are large, are prone to be wrong

matched because of large search areas in the image. In

order to increase the reliability of the feature association, a

joint compatibility test is used to reject the outlier features.

The joint distributions of the observations are verified

using the Mahalanobis distance and the Chi-squared

distribution:

D2hi¼ HT

hiC�1

hiHhi

\v2d;a; ð15Þ

where Hhirepresents the innovation of the matched feature

pairs, C�1hi

is the covariance of the joint innovation, a is the

desired confidence level, and d is the dimension of all the

matched features’ states. A Joint Compatibility Branch and

Bound (JCBB) [14] method is used in our experiment.

However, when the camera encounters large motion

disturbance and the KLT tracker are used to match the

features, the JCBB test may reject most of the KLT mat-

ched features, because they may be far from the predicted

locations and have large innovations. So, we disable it

when the KLT tracker is working. The JCBB test will be

activated as soon as the motion noise adjustment is

finished.

3 Experimental results

We use an ordinary USB camera with 320 9 240 resolu-

tion and 40 � 9 30 � field angle at 30 fps. In order to test

the robustness of the system to motion disturbance, the

frame sequence is captured by a hand-hold camera with

large motion disturbance.

The camera and the features’ tracks and positions errors

are shown in Fig. 2. In the picture, the dash lines with

triangles show the camera coordinates and the corre-

sponding covariance of the original method, the solid lines

with stars are the result of the proposed method. We can

see that when the camera encounters large motion distur-

bance, the original method could not make correct response

and fails. But the proposed method could still track the

features and estimate the system states.

The KLT tracker is very robust for feature tracking. But

without motion model, the KLT tracker is prone to get

incongruous velocity vectors. In our system, we do not use

it directly for system states update, but to suggest the

average value of the camera motion noise. Figs. 3–5 show

the same selected result sequence of the classical algo-

rithm, our former optical flow based method and the cur-

rent approach. In Fig. 3, the system is lost due to large

motion disturbance. In Fig. 4, optical flow could give the

tracking result, but they may not compatible with each

other. In Fig. 5, the motion noise is adjusted and could

produce better results.

When the camera encounters large motion disturbance,

the normalized estimation error squared (NEES) becomes

extremely large. This can be seen from Fig. 6. The IEKF

filter could help decrease the NEES error. However, it

could also magnify the feature matching error and cause

serious damage to the system. So, here, we use an IEKF

filter with dynamic iterate times. In serious situation that

the system states become largely ambiguous, two iterations

are used. If the camera moves smooth, only do one

Fig. 2 The left picture is the xyz coordinates and covariance of the camera, and the right picture is one selected feature’s xyz coordinates and

their covariance. When the camera encounters large motion disturbance in frame 10, 30, 39, 50 etc., the original EKF-SLAM is lost and restarted,

while the proposed method can still work well and produce smooth camera trajectory and more stable feature locations

Chin. Sci. Bull. (2014) 59(17):2050–2056 2053

123

iteration, which will degenerate to EKF. The result shows

that this strategy could largely decrease the estimation

error.

Figure 7 shows the joint compatibility test result. The

unstable feature 16 is rejected by JCBB algorithm. Since

only the visible features with a small number are tested

Fig. 3 The selected result sequence a, b, c, d without optical flow information. The points in the images are the predicted positions, the ellipses

are the corresponding predicted regions, reds are matched, cyans are unmatched, and the crosses are the matched features. The camera is largely

affected by motion disturbance, so the predicted region based on the linear motion model do not contain the actual feature locations, and could

not correctly match the features in c and d

Fig. 4 The same selected result sequence a, b, c, d as in Fig. 3 with optical flow. This is the result of our former approach [12]. When the motion

between c and d is beyond the expectation, the optical flow will be used to track the features. The crosses are the optical flow matched locations

in c, which are out of the predicted regions

Fig. 5 The same selected result sequence a, b, c, d as in Fig. 3 with the KLT tracker. With adjusted motion noise, the system could work well

with large motion disturbance

2054 Chin. Sci. Bull. (2014) 59(17):2050–2056

123

with JCBB algorithm, the computational cost is

affordable.

4 Conclusion

We propose a robust monocular EKF-SLAM algorithm

against motion disturbance utilizing optical flow based

noise adjustment. Our approach uses the KLT tracker to

match the features and estimate the camera movement

when the motion model-based features tracking fail. We

have demonstrated that the proposed method could handle

the critical situation that the camera encounters large

motion disturbance and fails, and make the system much

more reliable. We also use a JCBB test to reject unreliable

features. With an IEKF filter, we could largely decrease the

estimate error of the system states. Large motion distur-

bance may also introduce image blur, so some more reli-

able feature or optical flow algorithm could be used in this

strategy to get more robust result.

Acknowledgments This work was supported by the National Nat-

ural Science Foundation of China (91120006, 61273366 and

61231018).

References

1. Davison AJ, Reid ID, Molton ND et al (2007) MonoSLAM: Real-

time single camera SLAM. IEEE T Pattern Anal 29:1052–1067.

doi:10.1109/TPAMI.2007.1049

2. Choi W, Oh S (2011) Robust EKF-SLAM method against distur-

bance using the shifted mean based covariance inflation technique.

In: IEEE (ed) International conference on robotics and automation,

Shanghai, 9–13 May 2011. IEEE, Piscataway, pp 4054–4059

3. Lee H-S, Kwon J, Lee KM (2011) Simultaneous localization,

mapping and deblurring. In: IEEE (ed) International conference

on computer vision, Barcelona, 6–13 November 2011. IEEE,

New York, pp 1203–1210

4. Montemerlo M, Thrun S, Koller D et al (2002) Fast-SLAM: a

factored solution to the simultaneous localization and mapping

problem. In: Proceedings of AAAI national conference on arti-

ficial intelligence, Edmonton, Canada

5. Bailey T, Nieto J, Nebot E Consistency of the FastSLAM algo-

rithm. In: IEEE (ed) International conference on robots

Fig. 6 The NEES error of the camera in the left picture and the features’ in the right. It shows that the IEKF filter could produce smaller NEES

error than the EKF filter in our experiment

Fig. 7 Feature 16 is unstable for some reason, and it is rejected in the right image by the JCBB test

Chin. Sci. Bull. (2014) 59(17):2050–2056 2055

123

http://dx.doi.org/10.1109/TPAMI.2007.1049

automation, Orlando FL, 15–19 May 2006. IEEE, New York,

pp 424–429

6. Nister D, Naroditsky O, Bergen J (2004) Visual odometry. In:

Proceedings of the IEEE Computer Society conference on com-

puter vision and pattern recognition, Washington, DC, 27 June–2

July 2004. IEEE Computer Soc, Los Alamitos, pp 652–659

7. Tomasi C, Kanade T (1991) Detection and tracking of point

features. Int J Comput Vis

8. Scaramuzza D, Fraundorfer F, Siegwart R (2009) Real-time

monocular visual odometry for on-road vehicles with 1-point

RANSAC. In: IEEE (ed) International conference on robotics and

automation, Kobe, 12–17 May 2009. IEEE, New York,

pp 488–494

9. Alcantarilla P-F, Bergasa LM, Dellaert F (2010) Visual odometry

priors for robust EKF-SLAM. In: IEEE (ed) International con-

ference on robotics and automation, Anchorage AK, 3–7 May

2010. IEEE, New York, pp 3501–3506

10. Singh S-P-N, Csonka PJ, Waldron KJ (2006) Optical flow aided

motion estimation for legged locomotion. In: IEEE (ed)/RSJ

international conference on intelligent robots and systems, Bei-

jing, Oct 2006. IEEE, New York, pp 1738–1743

11. Klein G, Drummond T (2004) Tightly integrated sensor fusion for

robust visual tracking. Image Vis Comput 22:769–776

12. Liu W, Zheng NN, Yuan ZJ et al. (2008) Optical flow based

vision-SLAM. In: International workshop on vision, communi-

cations and circuits, Xi’an, 1–3 November 2008

13. Horn B, Schunck BG (1980) Determining optical flow. Artif in-

tell 17:185–203

14. Neira J, Tardos JD (2001) Data association in stochastic mapping

using the joint compatibility tests. IEEE Trans Robo Auto

17:890–897

15. Clumente L, Davison A, Reid I et al (2007) Mapping large loops

with a single hand-held camera. In: Proceedings of robotics:

science and systems conference on robotics, Atlanta, June 2007

16. Harris C, Stephens M (1988) A combined corner and edge

detector. In: Proceedings of the 4th alvey vision conference,

Manchester, England, Organising Committec AVC, pp 147–151

17. Schmid C, Mohr R, Bauckhage C (2000) Evaluation of interest

point detectors. Int J Comput Vis 37:151–172

18. Noble A (1989) Descriptions of image surfaces. Ph.D. Thesis,

Department of Engineering Science, University of Oxford

2056 Chin. Sci. Bull. (2014) 59(17):2050–2056

123

Documents

Robust monocular SLAM towards motion disturbance