Upload
tao-wang
View
214
Download
0
Embed Size (px)
Citation preview
Artic le Information Processing
Robust monocular SLAM towards motion disturbance
Wei Liu • Nanning Zheng • Zejian Yuan •
Pengju Ren • Tao Wang
Received: 19 September 2013 / Accepted: 8 November 2013 / Published online: 4 April 2014
� Science China Press and Springer-Verlag Berlin Heidelberg 2014
Abstract The standard extended Kalman filter-based
simultaneously localization and mapping (EKF-SLAM)
algorithm has a drawback that it could not handle the
sudden motion caused by the motion disturbance. This
prevents the SLAM system from real applications. Many
techniques have been developed to make the system more
robust to the motion disturbance. In this paper, we propose
a robust monocular SLAM algorithm. First, when the
motion model-based system failed to track the features, a
KLT tracker will be activated for each feature. Second, the
KLT tracked features are used to update the camera states.
Third, the difference between the camera states and the
predictions is used to adjust the input motion noise. Finally,
we do the standard EKF-SLAM with the new input motion
noise. In order to make the system more reliable, a joint
compatibility branch and bound algorithm are used to
check the outliers, and an IEKF filter is used to make the
motion estimation smoother when the camera encounters
sudden movement. The experiments are done on an image
sequence caught by a shaking hand-held camera, which
show that the proposed method is very robust to large
motion disturbance.
Keywords Visual SLAM � Motion disturbance �Noise adjustment � KLT tracker
1 Introduction
Estimating a robot’s location in an environment and
simultaneously describing the environment are the basic
work of a real robot system. This is the problem of
simultaneously localization and mapping (SLAM), which
has also attracted much attention. Applications such as
wearable computing (location awareness), augmented
reality, map building, unknown place exploration, etc.,
require big progress in this problem. Many kinds of sensors
such as laser, sonar, inertial instruments, GPS, and cameras
are used to get the information of the environment. But
systems with only cameras are most concerned.
The monocular extended Kalman filter (EKF) SLAM
system introduced by Davison et al. [1] has achieved great
success. A linear motion model is used to predict the sys-
tem changes. But the real movements are always combined
with unpredictable motion disturbance. In order to increase
the tolerance of the system to the motion disturbance, Choi
and Oh [2] inflate the state uncertainty using the magnitude
of the innovation. Lee et al. [3] use the information from
the SLAM to recover the blurred images to get more robust
results. Montemerlo et al. [4] introduce the particle filter to
the SLAM problem, called fast SLAM. It is a combination
of a number of EKF-SLAM with different distributions. It
behaves like a non-optimal local search algorithm and
degenerates with time [5].
Except for the model-based methods, Nister et al. [6]
presented a visual odometry algorithm. In this algorithm,
a KLT tracker [7] is used to match the features between
successive frames. This data association method could go
over the limitation of the motion model-based method,
but wrong data associations are also prone to be intro-
duced due to big and independent search regions.
Therefore, the RANSAC optimization algorithm [8, 9] is
W. Liu � N. Zheng � Z. Yuan � P. Ren (&)
Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong
University, Xi’an 710049, China
e-mail: [email protected]
W. Liu � T. Wang
AVIC Xi’an Flight Automatic Control Research Institute,
Xi’an 710065, China
123
Chin. Sci. Bull. (2014) 59(17):2050–2056 csb.scichina.com
DOI 10.1007/s11434-014-0254-1 www.springer.com/scp
used to reject the outliers to get better estimations. Singh
et al. [10] use optical flow to help the motion estimation
in a Leg system, and a Hybrid EKF approach is used in
different phases of the moving leg. Inertial sensors [11]
are also used to catch the motions, but it is complicated
by motion disturbance and sensor misalignment.
In this paper, we use the KLT tracker [7] to help the
monocular EKF-SLAM system to overcome the tracking
lost caused by sudden motions, and the flowchart is
shown in Fig. 1. The KLT tracker can work fast without
motion model and is also robust to motion blurs to some
extent. If the feature tracking is failed due to large
motion disturbance, the KLT tracker will be activated.
The tracking results are used to estimate the current
camera’s linear and angle velocities; the differences
between these results and the predictions are used to
adjust the value of the motion noise. Then, we can re-
predict the system states with the new motion noise. In
our former work [12], Horn and Schunck [13] optical
flow method is used in every frame to guide the feature
tracking. The camera states are updated directly by the
optical flow matched features. It could help the system to
track the features in large motion disturbance, but with a
drawback that it may introduce wrong matches that are
not compatible with each other.
Although the KLT tracker is very robust for feature
tracking, unreliable matches may also be introduced due to
feature detection and description methods, or processing
noise, etc. So, a joint compatibility test [14] algorithm is
used to reject the unreliable matches by the joint distri-
butions of all the visible features. Except for this, the
estimation error may increases largely in the situation of
large motion disturbance. This problem could benefit from
an IEKF filter [15] with only a little increase in computa-
tional complicity.
2 Robust monocular EKF-SLAM
2.1 Classical EKF-SLAM
The system states are represented by a state vector x, which
is composed of a group of parameters referring to the
camera state vector xc and n visual features {y1, y2, …, yn},
x ¼ xc; y1; y2; . . .; ynð Þ: ð1Þ
The camera state vector is composed of the camera
position r and its rotation quaternion q, as well as its linear
velocity v and angular velocity x in the world coordinates,
xc ¼ ðr; q; v;xÞ: ð2Þ
The ith feature position is represented by
yi ¼ xi; yi; zið Þ; ð3Þ
in which xi, yi, zi represent for the feature’s 3D position in
the world coordinate. The relationships between the
features and the camera are kept in a covariance matrix
P. In the prediction step, the system states and covariance
are
bxkjk�1 ¼ Fkbxkjk�1 þ nk; ð4Þ
bPkjk�1 ¼ FkbPk�1jk�1FT
k þ Qk: ð5Þ
Fk is system state transformation matrix, and nk is the
processing noise with covariance Qk. The system states
observation function is,
eyk ¼ zk � h bxkjk�1 þ nz
� �
; ð6Þ
where k is the discrete time step, nz is the observation noise with
covariance Qz, h is the system states observation function, H is
the partial derivative of h to x, and zk is the feature matching
results. The system states update functions are
Fig. 1 The process of robust monocular EKF-SLAM toward motion disturbance. The camera states are composed of its 3D location, pose,
linear, and angular velocities. The feature states are the feature’s 3D locations. The system states are both of them
Chin. Sci. Bull. (2014) 59(17):2050–2056 2051
123
bKk ¼ Pkjk�1Hk�1jk�1 HkPkjk�1Hk þ Qz
� ��1; ð7Þ
bxkjk ¼ bxkjk�1 þ Kkeyk; ð8Þ
Pkjk ¼ I � KkHkð ÞPkjk�1: ð9Þ
2.2 Feature detection and matching
The repeatability of the features is of vital importance for
being recognized in different frames. The Harris corner
[16] features are used in our system, which are robust to
moderate image distortions [17] and scale changes. The
shape matrix for Harris corner detection is
M ¼ A C
C B
� �
¼P
Gr � IxIx
P
Gr � IxIyP
Gr � IxIy
P
Gr � IyIy
� �
; ð10Þ
where Ix and Iy denote the horizontal and vertical deriva-
tives of the image, respectively, and Gr is the Gauss
template. The corner response is R = Det - k 9 Tr2, in
which Det = AB - C2 and Tr = A ? B, and the parame-
ter k is normally set to 0.04. But here, we prefer a modi-
fication of the response function, R = Det/(Tr ? eps), eps
represents for the positive minimum number, which is
suggested by Noble [18]. The features that have bigger
responses are recognized as Harris corners. The ones that
have a minimum distance of dmin are selected, because
well-distributed features will produce well position result.
We try to keep the visible features in a certain number. If
the visible features are too few, new features will be
detected and the ones that are well distributed and have
bigger responses will be added.
The features are described with image patches, which
are centered on the features’ positions. When we get a new
frame of observation, the features’ positions and the cor-
responding uncertainties are predicted by a Kalman filter.
Then, the mutual covariance of the selected features and
every patch centered inside the predicted region are cal-
culated. The patch that has biggest mutual covariance is
considered as the matched feature. However, when
encounters large motion disturbance, the potentially mat-
ched features may fall out of the predicted regions. In this
situation, the features will be tracked by KLT, which is not
limited by a motion model and is also robust to motion
blurs.
2.3 Noise adjustment
The EKF-SLAM could perform very well in linear process.
If the camera moves smoothly, the filter can predict the
feature’s locations accurately. However, if the camera
encounters large motion disturbance that is beyond the
default level of the noise, the features may drop out of the
predicted regions. This will make the features unable to
match correctly and eventually make the system lost. In
this situation, the KLT tracker is activated to track the
features. But these matched features could not be directly
added to the system as we used to do in [12], because they
are not compatible with each other, it may potentially
damage the system. In our new approach, the KLT matched
features are used to update the camera states to get an
approximate estimation of the camera speed. The differ-
ences between these results and the predictions are used as
the average value of the motion noise. Then, we will do the
standard step of the EKF-SLAM again with the new
motion noise.
When we do the prediction with the new motion noise,
all the features are predicted again. Then, the KLT matched
features are checked by the new predictions. If their dif-
ference is below a certain threshold, then the KLT results
will be accepted. Otherwise, it will be rejected. All the
unmatched features will be searched in the new predicted
regions again. Then, the system total states are updated.
The Kalman filter update is a time-consuming process.
However, only the information of the matched features is
responsible for the camera state changes. So, in the first
step, the KLT matched features are used to update the
camera states.
if zi 2 zKLT;HKLT ¼ ½ hix; � � � �T; ð11Þ
where hix corresponds to the optical flow matched features,
if zi 2 zKLT and zj 2 zKLT;
PKLT ¼
pxcxcpxcyi
pxcyj. . .
pyixcpyiyi
pyiyj. . .
pyjxcpyjyi
pyjyj. . .
..
. ... ..
. . ..
2
6
6
6
4
3
7
7
7
5
; ð12Þ
the corresponding covariance pyiyjis extracted from the
original covariance matrix. The extracted HKLT and PKLT
are used to calculate the Kalman file gain KKLT, and then
we can get the new camera state vector,
bxkjkc KLT ¼ bxkjk�1
c þ KkKLT � zk
KLT � bzkjk�1� �
: ð13Þ
So, the average value of the processing noise can be
written as
nk ¼ bxkjkc KLT � bxkjk�1
c ¼ KkKLT � zk
KLT � bzkjk�1� �
; ð14Þ
in function (14), only the velocity are assigned to nk. With
the help of the KLT tracker, the system is more robust to
motion disturbance. This strategy also shows robust to the
confusion of the KLT tracker. With more accurate pre-
dicted positions, we could also find out or reject the fea-
tures that are wrong predicted or wrong matched,
especially those ones that are near the edge of the image.
2052 Chin. Sci. Bull. (2014) 59(17):2050–2056
123
2.4 Joint compatibility
As is known that data association is very important for a
SLAM system, wrong association is very harmful to the
filter. Especially, the newly observed features, whose 3D
position uncertainties are large, are prone to be wrong
matched because of large search areas in the image. In
order to increase the reliability of the feature association, a
joint compatibility test is used to reject the outlier features.
The joint distributions of the observations are verified
using the Mahalanobis distance and the Chi-squared
distribution:
D2hi¼ HT
hiC�1
hiHhi
\v2d;a; ð15Þ
where Hhirepresents the innovation of the matched feature
pairs, C�1hi
is the covariance of the joint innovation, a is the
desired confidence level, and d is the dimension of all the
matched features’ states. A Joint Compatibility Branch and
Bound (JCBB) [14] method is used in our experiment.
However, when the camera encounters large motion
disturbance and the KLT tracker are used to match the
features, the JCBB test may reject most of the KLT mat-
ched features, because they may be far from the predicted
locations and have large innovations. So, we disable it
when the KLT tracker is working. The JCBB test will be
activated as soon as the motion noise adjustment is
finished.
3 Experimental results
We use an ordinary USB camera with 320 9 240 resolu-
tion and 40 � 9 30 � field angle at 30 fps. In order to test
the robustness of the system to motion disturbance, the
frame sequence is captured by a hand-hold camera with
large motion disturbance.
The camera and the features’ tracks and positions errors
are shown in Fig. 2. In the picture, the dash lines with
triangles show the camera coordinates and the corre-
sponding covariance of the original method, the solid lines
with stars are the result of the proposed method. We can
see that when the camera encounters large motion distur-
bance, the original method could not make correct response
and fails. But the proposed method could still track the
features and estimate the system states.
The KLT tracker is very robust for feature tracking. But
without motion model, the KLT tracker is prone to get
incongruous velocity vectors. In our system, we do not use
it directly for system states update, but to suggest the
average value of the camera motion noise. Figs. 3–5 show
the same selected result sequence of the classical algo-
rithm, our former optical flow based method and the cur-
rent approach. In Fig. 3, the system is lost due to large
motion disturbance. In Fig. 4, optical flow could give the
tracking result, but they may not compatible with each
other. In Fig. 5, the motion noise is adjusted and could
produce better results.
When the camera encounters large motion disturbance,
the normalized estimation error squared (NEES) becomes
extremely large. This can be seen from Fig. 6. The IEKF
filter could help decrease the NEES error. However, it
could also magnify the feature matching error and cause
serious damage to the system. So, here, we use an IEKF
filter with dynamic iterate times. In serious situation that
the system states become largely ambiguous, two iterations
are used. If the camera moves smooth, only do one
Fig. 2 The left picture is the xyz coordinates and covariance of the camera, and the right picture is one selected feature’s xyz coordinates and
their covariance. When the camera encounters large motion disturbance in frame 10, 30, 39, 50 etc., the original EKF-SLAM is lost and restarted,
while the proposed method can still work well and produce smooth camera trajectory and more stable feature locations
Chin. Sci. Bull. (2014) 59(17):2050–2056 2053
123
iteration, which will degenerate to EKF. The result shows
that this strategy could largely decrease the estimation
error.
Figure 7 shows the joint compatibility test result. The
unstable feature 16 is rejected by JCBB algorithm. Since
only the visible features with a small number are tested
Fig. 3 The selected result sequence a, b, c, d without optical flow information. The points in the images are the predicted positions, the ellipses
are the corresponding predicted regions, reds are matched, cyans are unmatched, and the crosses are the matched features. The camera is largely
affected by motion disturbance, so the predicted region based on the linear motion model do not contain the actual feature locations, and could
not correctly match the features in c and d
Fig. 4 The same selected result sequence a, b, c, d as in Fig. 3 with optical flow. This is the result of our former approach [12]. When the motion
between c and d is beyond the expectation, the optical flow will be used to track the features. The crosses are the optical flow matched locations
in c, which are out of the predicted regions
Fig. 5 The same selected result sequence a, b, c, d as in Fig. 3 with the KLT tracker. With adjusted motion noise, the system could work well
with large motion disturbance
2054 Chin. Sci. Bull. (2014) 59(17):2050–2056
123
with JCBB algorithm, the computational cost is
affordable.
4 Conclusion
We propose a robust monocular EKF-SLAM algorithm
against motion disturbance utilizing optical flow based
noise adjustment. Our approach uses the KLT tracker to
match the features and estimate the camera movement
when the motion model-based features tracking fail. We
have demonstrated that the proposed method could handle
the critical situation that the camera encounters large
motion disturbance and fails, and make the system much
more reliable. We also use a JCBB test to reject unreliable
features. With an IEKF filter, we could largely decrease the
estimate error of the system states. Large motion distur-
bance may also introduce image blur, so some more reli-
able feature or optical flow algorithm could be used in this
strategy to get more robust result.
Acknowledgments This work was supported by the National Nat-
ural Science Foundation of China (91120006, 61273366 and
61231018).
References
1. Davison AJ, Reid ID, Molton ND et al (2007) MonoSLAM: Real-
time single camera SLAM. IEEE T Pattern Anal 29:1052–1067.
doi:10.1109/TPAMI.2007.1049
2. Choi W, Oh S (2011) Robust EKF-SLAM method against distur-
bance using the shifted mean based covariance inflation technique.
In: IEEE (ed) International conference on robotics and automation,
Shanghai, 9–13 May 2011. IEEE, Piscataway, pp 4054–4059
3. Lee H-S, Kwon J, Lee KM (2011) Simultaneous localization,
mapping and deblurring. In: IEEE (ed) International conference
on computer vision, Barcelona, 6–13 November 2011. IEEE,
New York, pp 1203–1210
4. Montemerlo M, Thrun S, Koller D et al (2002) Fast-SLAM: a
factored solution to the simultaneous localization and mapping
problem. In: Proceedings of AAAI national conference on arti-
ficial intelligence, Edmonton, Canada
5. Bailey T, Nieto J, Nebot E Consistency of the FastSLAM algo-
rithm. In: IEEE (ed) International conference on robots
Fig. 6 The NEES error of the camera in the left picture and the features’ in the right. It shows that the IEKF filter could produce smaller NEES
error than the EKF filter in our experiment
Fig. 7 Feature 16 is unstable for some reason, and it is rejected in the right image by the JCBB test
Chin. Sci. Bull. (2014) 59(17):2050–2056 2055
123
automation, Orlando FL, 15–19 May 2006. IEEE, New York,
pp 424–429
6. Nister D, Naroditsky O, Bergen J (2004) Visual odometry. In:
Proceedings of the IEEE Computer Society conference on com-
puter vision and pattern recognition, Washington, DC, 27 June–2
July 2004. IEEE Computer Soc, Los Alamitos, pp 652–659
7. Tomasi C, Kanade T (1991) Detection and tracking of point
features. Int J Comput Vis
8. Scaramuzza D, Fraundorfer F, Siegwart R (2009) Real-time
monocular visual odometry for on-road vehicles with 1-point
RANSAC. In: IEEE (ed) International conference on robotics and
automation, Kobe, 12–17 May 2009. IEEE, New York,
pp 488–494
9. Alcantarilla P-F, Bergasa LM, Dellaert F (2010) Visual odometry
priors for robust EKF-SLAM. In: IEEE (ed) International con-
ference on robotics and automation, Anchorage AK, 3–7 May
2010. IEEE, New York, pp 3501–3506
10. Singh S-P-N, Csonka PJ, Waldron KJ (2006) Optical flow aided
motion estimation for legged locomotion. In: IEEE (ed)/RSJ
international conference on intelligent robots and systems, Bei-
jing, Oct 2006. IEEE, New York, pp 1738–1743
11. Klein G, Drummond T (2004) Tightly integrated sensor fusion for
robust visual tracking. Image Vis Comput 22:769–776
12. Liu W, Zheng NN, Yuan ZJ et al. (2008) Optical flow based
vision-SLAM. In: International workshop on vision, communi-
cations and circuits, Xi’an, 1–3 November 2008
13. Horn B, Schunck BG (1980) Determining optical flow. Artif in-
tell 17:185–203
14. Neira J, Tardos JD (2001) Data association in stochastic mapping
using the joint compatibility tests. IEEE Trans Robo Auto
17:890–897
15. Clumente L, Davison A, Reid I et al (2007) Mapping large loops
with a single hand-held camera. In: Proceedings of robotics:
science and systems conference on robotics, Atlanta, June 2007
16. Harris C, Stephens M (1988) A combined corner and edge
detector. In: Proceedings of the 4th alvey vision conference,
Manchester, England, Organising Committec AVC, pp 147–151
17. Schmid C, Mohr R, Bauckhage C (2000) Evaluation of interest
point detectors. Int J Comput Vis 37:151–172
18. Noble A (1989) Descriptions of image surfaces. Ph.D. Thesis,
Department of Engineering Science, University of Oxford
2056 Chin. Sci. Bull. (2014) 59(17):2050–2056
123