Robot Vision for Space Applicationsrobvis05/ProceedingsCom...IEEE/RSJ IROS 2005 Workshop on Robot Vision for Space Applications Aug 2, 1:30 PM to 5 PM in Salon 15 Shaw Conference Centre,

IEEE/RSJ IROS 2005 Workshop on

Robot Vision for Space Applications

Aug 2, 1:30 PM to 5 PM in Salon 15 Shaw Conference Centre, Edmonton, Alberta, Canada

Organizers: Leo Hartman, Martin Jagersand, Piotr Jasiobedzki, Ioannis Rekleitis

Proceedings editor: Martin Jagersand

Schedule and Proceedings Table of Contents

Time Talk Page

13:30Welcome and introductory lecture on Robot Vision for the Space Shuttle and SpaceStation Robot Arms Piotr Jasiobedzki, MD Robotics.

3

14:10Satellite Pose Acquisition and Tracking with Variable Dimensional Local Shape Descriptors Babak Taati and Michael Greenspan, Queen’s University

4

14:30 Development of a Scaled Ground Testbed for Lidar-based Pose Estimation Andrew C.M. Allen, Nikolai N. Mak, Christopher S. Langley, MDA corporation 10

14:50 Coffee break

15:10 Scalable real-time vision-based SLAM for planetary rovers Robert Sim, Matt Grifin, Alex Shyr, and James J. Little, UBC 16

15:30 Estimation of the aerodynamical parameters of an experimental airship Diego Patino, Leonardo Solaque, Simon Lacroix and Alain Gauthier, LAAS - CNRS and Andes University

22

15:50End-point Control of a Flexible Structure Mounted Manipulator Based on Wavelet Basis Function Networks David X.P. Cheng,Defence Research and Development Canada

29

16:10 ARTag Fiducial Marker System Applied to Vision Based Spacecraft Docking Mark Fiala, National Research Council 35

16:30 End notes and survey of Space Agency activities Leo Hartman and Ioannis Rekleitis, CSA 41

IEEE/RSJ IROS 2005 Workshop on

Robot Vision for Space Applications

Introduction

The demanding nature of space operations obliges continuous innovation while at the same time requiring high reliability and robustness. This provides a "trial by fire" for any new research, and subsequently after having been space qualified much of this technology makes it into other applications. Of particular interest for the IROS audience are the high demands that space applications put on robotic hardware and software. For instance, weight and power constraints force hardware designers to depart from the rigid and heavy manipulators used in terrestrial manufacturing. In turn these new lightweight designs pose new challenges in controller and software development, operations (e.g., the flexibility in the space shuttle arm), the types of feedback used (e.g., joint feedback is not sufficient for a non-rigid arm, but end-point feedback from visual tracking or other wide area sensing is needed), and how they are employed to solve tasks (e.g., supervisory control that stratifies the human-robot interface, but does not require full robot autonomy).

Vision and other wide area sensing technologies are critical to future space robotics applications in order to allow safe interaction with the environment and efficient environment modeling. In addition to the usual challenges of environment modeling, target acquisition and tracking, planning and monitoring, space vision systems have to address additional problems including qualification for operation in space, severe lighting variation, limited onboard computing resources and limited viewpoints and mobility. A space vision system also has to be tightly integrated into other onboard subsystems (robot control system, command and telemetry, onboard networking, real-time infrastructure) in such a way that its behavior is always well defined, that its performance can be monitored and verified and that in the interest of fault tolerance the system of which it is a part can work with it and without it. Due to the mission and safety critical status of space robotic equipment, a space vision system controlling such equipment must be highly reliable. If someone has to monitor it, using it will save little human effort.

The workshop will bring together researchers interested in robotic vision methods particularly suitable for space applications. A particular aim for the workshop is to connect academic researchers with applied problems in space industry and institutes. To support this objective we will have participation from Canadian Space Agency (CSA) and MD Robotics (builder of the robotic arms on the space shuttle and international space station). The program was selected from submitted papers and spanacademic, government and industrial research. Each paper was reviewed by two or three reviewers.We wish to thank the reviewers for their efforts. The paper acceptance rate was 46%.

Many of the new technologies will not only improve robotics capabilities in space but promise to also widen the applicability of robotics on earth and take the field beyond its traditional center of gravity in manufacturing into other fields such as exploration, mining and natural resource extraction, emergency response, decontamination, etc. Therefore we expect this workshop to be of interest not only to those who currently work in space applications but also to those who want to stay up to date on recent progress and get an idea of what technologies may be ripe to transfer between space and other application areas.

We hope the workshop will be an interesting and rewarding experience for all participants.

The organizers:Leo Hartman Canadian Space Agency [email protected] Martin Jagersand University of Alberta [email protected] Piotr Jasiobedzki MD Robotics [email protected] Ioannis Rekleitis Canadian Space Agency [email protected]

2

Invited LectureRobotic Vision for Autonomous Operations in Space

Piotr JasiobedzkiMDA Space Missions

9445 Airport Rd., Brampton ON, Canada

AbstractSpace mission have involved humans to perform complex tasks requiring perception, decision

making, and dexterity. Inherent dangers of space travel, complexity of space systems and paramount safety requirements have been driving up the cost of manned missions. Ground based control via teleoperation offers operator safety, employs remote sensors to provide information for human decision making, and uses remote robots to perform tasks. Communication links with low latency, and high bandwidth and availability are essential for teleoperation. Such links are expensive for Earth orbiting spacecraft and are not at all available for planetary systems. One solution is to increaseautonomy of remote agents (servicer spacecraft, planetary rovers, and exploration probes) by providing them with abilities to make their own decisions using data from on-board sensors. Multi-dimensional data such as camera images and three-dimensional measurements from rangefinders offer the most complete representations of observed scenes. Vision systems can be used to process and analyse such data facilitating interpretation and autonomous decision making.

Applications of vision systems for Earth orbit focus primarily on autonomous servicing. Vision guided rendezvous of an unmanned servicer spacecraft with a target satellite may start at a distance of several kilometers and conclude with docking under vision system control. Alternatively, the servicer may hoover in proximity of the target and capture the target with a robotic arm controlled in a visual servoing mode. Robotic servicing will rely on vision systems to model workspaces in 3D and to compare them with blueprints, as well as, to guide end-effectors and tools during operations.Planetary applications that require vision technologies may include landing systems that can guide a lander to a safe landing site and allow for precision landing. Surface exploration with rovers requires vision systems to provide data for local and long range navigation, localization and terrain mapping. Virtual presence and mission monitoring require an ability to create photorealistic and three dimensional models of visited sites.

Challenges of computer vision for use on-orbit include the high contrast and high dynamic range of scenes caused by the intense sun illumination of foreground objects and the lack of diffuse lighting of background objects, and the complex shapes of space structures and their featureless and reflectivesurfaces. Highly unstructured and self-similar planetary surfaces pose challenges for vision systems for use in planetary exploration as these systems will have to not only detect local obstacles but also incrementally create terrain maps that can be used for vehicle localization and future terrain traversals.

MDA Space Missions has been developing a range of space vision technologies, such as the Spaceborne Scanning Lidar that has been operating since April 2005 on the XSS-11 spacecraft, a vision system that will guide a robotic arm to capture a free floating satellite in the Orbital Express space demonstration, and vision systems for servicing of the Hubble Space Telescope. Several on-going research and development projects address development of vision systems for satellite servicing, localization and terrain mapping for planetary rovers, and environment modeling.BioDr Piotr Jasiobedzki leads research and development in vision system for space and terrestrial applications at MDA Space Missions (formerly Spar Space Systems and MD Robotics) in Brampton ON. Before joining MDA in 1995, he worked at the University of Toronto (Canada) and University of Manchester (UK). His research interests include stereo and 3D vision, sensor fusion, biomedical image interpretation, and robot control and navigation. Dr Jasiobedzki has two patents and has published over 40 papers in technical journals and conference proceedings.

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

3

Satellite Pose Acquisition and Trackingwith Variable Dimensional Local Shape Descriptors

Babak Taati1 Michael Greenspan1,2

1Department of Electrical and Computer Engineering, 2School of ComputingQueen’s University, Kingston, Ontario, Canada

[email protected] [email protected]

Abstract

A set of local shape descriptors for range data are pro-posed for reliable point matching in pose acquisitionand tracking applications. Several local shape proper-ties are extracted from the principal component spaceof various neighbourhoods and the similarity betweenhistograms of these properties is used for hypothesizingpoint matches. The dimensionality of the descriptorsmay vary, and the effectiveness of several 1D through9D histograms are experimentally analysed in trackinga range data sequence of a satellite model. The bestperforming descriptors are used for pose determinationbetween pairs of range images and are shown to outper-form two well-known existing descriptors.

keywords: 3D pose acquisition, tracking, local shapedescriptors, principal component analysis, range data

1 Introduction

Establishing feature correspondences reliably is nec-essary for efficient pose acquisition and tracking. Lo-cal Shape Descriptors (LSDs) have been utilised by re-searchers for encapsulating the local geometry informa-tion of various neighbourhoods and their similarity isused as an indicator of the similarity between the cor-responding geometries. Point matches are then usedfor registering range image sequences for tracking orfor registering a range image with a complete surfacemodel for pose determination.

Ideally, descriptors should be computationally andmemory efficient, discriminating, and robust with re-spect to a variety of inevitable conditions such as noiseand self-occlusion. They should also be robust withrespect to the viewing angle.

In the Spin Images technique [8], 2D histogramsare formed based on the distances of the neighbouringpoints from the normal vector and the tangent plane.The Point Signatures method [3] constructs 1D arraysbased on the distance profile of intersection of a sphere

with the object from the tangent plane. Other lo-cal shape descriptors developed for pose estimation in-clude Surface Signatures [12], Pairwise Geometric His-tograms [1], Statistical Matrices [11], and Point Fin-gerprints [9]. A review of some of these techniques isoffered in [2].

The abovementioned methods take a minimalist ap-proach, in that they try to construct low dimensionaland compact descriptors, presumably to favour effi-ciency. We propose that using high dimensional de-scriptors could be the key to more reliable and robustpoint matching without sacrificing computational andmemory efficiency. To this end, we have developed alarge variety of variable dimensional shape descriptors,based on local properties derived from principal com-ponent analysis. These descriptors can be consideredas a generalization of well-known LSDs such as SpinImages and Point Signatures and our experimental re-sults confirm their effectiveness in comparison to theexisting methods.

The main motivation for our research is developing apose acquisition technique for tracking a satellite basedon LIDAR data as part of a larger system that could en-able automatic rendezvous and robotic capture of mal-functioning satellites by an unmanned spacecraft forservice and redeployment [7, 6]. Other applications fora 3D pose estimation system include 3D model build-ing, industrial inspection, augmented reality, and com-puter aided surgery.

2 PCA-Based Descriptors

We construct our shape descriptors based on prop-erties extracted from the principal component spaceof the neighbourhood of each point in a point cloud.Eigenvalue decomposition of the covariance matrixof each neighbourhood is used to associate an or-thonormal frame (~i, ~j, ~k) and three eigenvalue scalars(e1, e2, e3), representing vector lengths along each axisof the frame, to each point. Using these vectors and


4

PositionProperty Description Rangex, y, z coordinates along main axes [−R,R]

axes distancesXa, Ya, Za Xa =

√y2 + z2 [0, R]

Ya =√

z2 + x2

Za =√

x2 + y2

dist reference point distance [0, R]d =

√x2 + y2 + z2

Table 1: Basic and Extended Position Properties

scalars we generate nine basic and several extendedproperties for each point in the neighbourhood and canform a variety of histograms that carry various levelsof information about the local geometry.

The orthonormal frame of each point can be treatedas a Cartesian coordinate frame known as the principalcomponent space. While constructing a descriptor fora point, we refer to it as the reference point (p’) and toits frame as the reference frame. For each neighbouringpoint (p), the basic properties define its relationship tothe reference point and consist of three position scalars,three direction scalars, and three dispersion scalars.

The coordinates of each neighbouring point ex-pressed in the reference frame, x, y, and z along ~i′,the major, ~j′ the semi-major, and ~k′, the minor axesrespectively, form the three basic position properties.Several extended position properties can be calculatedbased on these coordinates. Table 1 lists some possibleextended position properties, the nomenclature we usefor referring to them, and the range of possible valuesfor each property, where R denotes the neighbourhoodradius.

The rotation that aligns the orthonormal frame of aneighbouring point with the reference frame can be de-fined with three parameters. We chose the inner prod-ucts of the corresponding axes between the two framesas the three basic direction properties. The rotationcan be represented in various forms and therefore it ispossible to construct several extended properties. Someof these extended direction properties are listed in Ta-ble 2, where Ci is the shorthand notation for cos(i).

Eigenvalues of the neighbourhood covariance matrixform the three basic dispersion scalars. Three scale-independent extended dispersion properties are gen-erated by normalizing the basic values by their cor-responding dispersion property of the reference point.Table 3 lists the dispersion properties, where ei refersto a basic dispersion property of the reference point.

Figure 1 shows a satellite range image containingnearly 50, 000 3D points. The neighbourhoods of two

DirectionProperty Description Range

inner products of axesCθ, Cφ, Cψ Cθ = ~i · ~i′ [−1, 1]

Cφ = ~j · ~j′

Cψ = ~k · ~k′

roll, pitch, yaw ZYX Euler angles [−π, π]α, β, γ ZYZ Euler angles [−π, π]

Table 2: Basic and Extended Direction Properties

DispersionProperty Description Rangee1, e2, e3 eigenvalues [−R2, R2]

|e1| > |e2| > |e3|e1 = e1/e1

e1, e2, e3 e2 = e2/e2 [−∞,∞]e3 = e3/e3

Table 3: Basic and Extended Dispersion Properties

i’

j’

k’

ijk

p’

p

→ →

→

→

→

→

Figure 1: Two points on a satellite point cloud, theirneighbourhoods, and their reference frames

selected points (p and p’) are shown in a darker shade.The local frames of the two points are illustrated andare labeled with (~i, ~j, ~k) and (~i′, ~j′, ~k′). Values of thenine basic properties for point p are listed in Table 4with respect to point p’ as the reference point.

For each reference point, a histogram could be con-structed based on one or any combination of the prop-erties of neighbouring points. We note that only nineindependent properties exist for each neighbour pointand therefore we limit our histograms to have onethrough nine dimensions, with each dimension beingone of the 22 properties listed in Tables 1, 2, and 3.

Histograms are normalized to the number of neigh-bouring points that contribute to their construction so


5

position (mm) direction dispersion (mm2)

x y z Cθ Cφ Cψ e1 e1 e1

-356 -265 16 -0.53 -0.46 0.90 4652 3429 118

Table 4: Basic properties of point p in Figure 1 withrespect to point p’ as the reference point

the descriptors are robust to sampling resolution. Theneighbourhood radii for performing the PCA and forselecting points that contribute to each histogram donot necessarily need to be the same, but were set tothe same value in the experiments. Histogram inter-section [10] is used for measuring the disparity betweenhistograms and provides a continuous similarity mea-sure in the [0, 1] interval, where 1 denotes identical his-tograms.

Since the range for the scale-independent dispersionproperties is unlimited, we clip the values beyond a pre-determined maximum before constructing a histogrambased on them. We have also chosen to clip the val-ues of the basic dispersion properties beyond a certainrange since the great majority of their values lie withina range much smaller than their absolute possible max-imum. In the experiments, the clipping ranges wereempirically set to

[0, R2/2

],

[0, R2/4

], and

[0, R2/8

]for e1, e2, and e3 properties respectively, and to

[12 , 2

]for the extended dispersion properties.

3 Experiments

A set of experiments were performed to analyse thepoint matching capabilities of the PCA-based descrip-tors and compare their performance with that of SpinImages and Point Signatures. A one-fifth scale model ofthe RADARSAT satellite and an Optech ILRIS 3D LI-DAR sensor were mounted on two manipulator robotsso that the LIDAR could scan the satellite from vari-ous orientations and distances while the robotic armsmoved to simulate the motion of a spacecraft approach-ing the satellite. Movement of the arms was slow rel-ative to the data acquisition rate so the motion skewwas negligible. Since the configuration of both ma-nipulators was known during the scanning period, aground truth rigid transformation existed between var-ious scans. A VRML model of the RADARSAT satel-lite is shown in Figure 2(a).

Two range images from the sequence were selectedfor the first point matching experiments in order toselect a small subset of PCA-based descriptors. Thedisparity in the viewing angle for the two range im-ages was 10◦ in the roll angle and 10◦ in the yaw an-gle. One hundred points were randomly selected on the

Figure 2: (a) RADARSAT VRML model (b) Corre-sponding points from the second image for the 100randomly selected points on the first image

first point cloud such that they were also visible on thesecond point cloud. Based on the ground truth trans-formation, the corresponding point for each of theserandom points on the second scan were identified. Thesecond point cloud and the selected points on it areillustrated in Figure 2(b).

Figure 3 illustrates a sample confusion matrixgenerated for the hundred points based on 6D[e2, e3, Xa, e1, Za, γ] histograms of size 46. This confu-sion matrix is a 100×100 matrix in which the rows andcolumns represent the selected points on the first andsecond point clouds respectively. In the illustration,higher similarities are shown in darker shades and itcan be seen that the main diagonal elements of the ma-trix are mostly darker than the off diagonal elements.Each diagonal element that is larger than all the otherelement in its row (or column), represents a point thatis correctly matched to its true match on the otherrange image. In this case, there were 73 points thatwere correctly identified.

Since we use RANSAC [4] for generating statisti-cally stable poses from candidate correspondences, it isnot strictly necessary for the points to be matched ex-actly with their true match as long as their true matchgains a high rank based on the similarity between theirdescriptors. We define Ranki as the number of truematches that were ranked within the top-i highest sim-ilarities. Rank3 was used as a performance measure inthe experiments. The results showed that the relation-ship between Rank3 and Rank1 (i.e. number of correctmatches) was roughly linear.

The bounding box for the satellite model is approx-imately 2.1m× 0.7m× 0.5m. The sampling resolutionis such that range images of the tracking sequence each


6

Figure 3: A sample confusion matrix based on 6D[e2, e3, Xa, e1, Za, γ] histograms

contain about 40k-70k 3D points (47, 884 and 56, 743for the two selected images). The neighbourhood ra-dius was empirically set to 15cm. The number of binsfor histograms along each dimension were set to 100,35, 11, 6, 5, 4, 3, 3, and 3 for 1D through 9D histogramsrespectively.

Based on the 22 properties listed in Tables 1 to 3,over a million possible histograms could be constructedfor each point1. A forward selection scheme was utilisedfor obtaining a suitable subset of descriptors from thevast pool of possible PCA-based histograms. First 221D histograms based on properties listed in Tables 1,2, and 3 were tested and the top 8 descriptors were se-lected based on the Rank3 measure. The top 8 1D de-scriptors were then combined with all the 22 propertiesto form several 2D histograms and the 8 best perform-ing 2D histograms were identified. Similarly, the top 8histograms of higher dimensions were selected and eachin turn were used to construct 8×22 higher dimensionalhistograms. In the end, we identified 8 top-performingdescriptors for each dimensionality based on their pointmatching capabilities.

Table 5 lists the 2 top performing histograms for eachdimensionality. The basic dispersion properties seemto provide the highest discrimination power among theone dimensional histograms. Figure 4(a) illustrates theRank3 and Rank1 measures for the top performing de-scriptor and the average of these measures for the top8 descriptors at each dimensionality. The Rank1 andRank3 measure are larger than 50% for 70% respec-tively for all 2D and higher dimensional descriptors.The measures are larger than 60% and 80% respec-tively for 4D and higher dimensional descriptors. Itcan also be observed that higher dimensional descrip-tors (≥ 4D) provided better point matching than thelower dimensional ones despite the fact that they weremore coarsely sampled.

1 C221 + C22

2 + . . . + C229 = 1, 097, 789

Dim. Selected Histograms1D [e1], [e2]2D [e2, e3], [e2, e3]3D [e2, e3, yaw], [e2, e3, e1]4D [e2, e3, Xa, e1], [e2, e3, Za, e1]5D [e2, e3, Xa, e1, Za], [e2, e3, Xa, e1, dist]6D [e2, e3, Xa, e1, Za, γ]

[e2, e3, Xa, e1, dist, γ]7D [e2, e3, Xa, e1, Za, Ya, z]

[e2, e3, Xa, e1, Za, Ya, dist]8D [e2, e3, Xa, e1, Za, Ya, z, dist]

[e2, e3, Xa, e1, Za, e3, e1, pitch]9D [e2, e3, Xa, e1, Za, e3, e1, α, e2]

[e2, e3, Xa, e1, Za, e3, e1, pitch, α]

Table 5: Top-2 selected histograms at each dimen-sion

1 2 3 4 5 6 7 8 920

40

60

80

(a) Dim.

# po

ints

1 2 3 4 5 6 7 8 920

40

60

80

(b) Dim.

# po

ints

Rank3 maxRank3 avgRank1 maxRank1 avg

Figure 4: Rank3 and Rank1 measures for the first (a)and the second (b) pair of satellite range images

A similar experiment was performed on a second pairof range images to confirm the discrimination power ofthe selected descriptors. One of the range images fromthe first pair was replaced with another image from thescanning sequence such that the new pair had a largerview point disparity with 20◦ and 10◦ differences inroll and yaw angles respectively. Figure 4(b) plots theRank3 and Rank1 measures for the previously selectedtop performing descriptors applied for point matchingacross the second pair. As expected, the numbers aresmaller due to the larger disparity between the viewpoints but we notice that Rank3 is still over 50% forall 2D and higher dimensional descriptors. We alsoobserve that the 6D histograms provide the best pointmatching capability.

Our final experiment was point matching between


7

1D 2D 3D 4D 5D 6D 7D 8D 9D S1 S2 P2 P20

5

10

15

20

25

30

35#

poin

tsRank1Rank3Rank10

Figure 5: Rank3, Rank1, and Rank10 for matchingsparse scene points to dense model points

the second pair of satellite range images for pose es-timation. Descriptors were constructed for 2% of thepoints on one image and for 0.1% of the points on thesecond image. Due to memory limitations, the numberof bins along each direction for 9D histograms was setto 2 in this experiment.

Since the LIDAR images are dense, the 2% subsam-pling provides a rather large set of points and this ex-periment is equivalent with constructing descriptors fora dense subset of model points offline and for a sparseset of scene points online. Therefore, in this experimentwe refer to the densly subsampled image as the modeland to the sparsely sampled image as the scene. Thesubsampling rates lead to almost 1, 000 descriptors onthe model and 57 descriptors on the scene.

Scene descriptors were matched to model descriptorsbased on the selected PCA-based descriptors, as wellas Spin Images and Point Signatures. Figure 5 showsthe Rank1, Rank3, and Rank10 measures for the PCA-based descriptor at each dimensionality, for Spin Im-ages of size 35× 35 (S1) and size 50× 50 (S2), and forPoint Signatures of length 24 (P1) and length 100 (P2).The parameters for Spin Images and Point Signatureswere set according to the suggested values in [8] and[3]. The sphere radius for Point Signatures was set to10cm. The neighbourhood radius for constructing SpinImages was set to 15cm and the neighbourhood radiusof 4cm was used for estimating the surface normal di-rections. We observe that even the low dimensionalPCA-based descriptors outperform both Spin Imagesand Point Signatures.

To further illustrate the point matching qualities ofour PCA-based histograms against Spin Images andPoint Signatures, Figure 6 shows rank histograms ofthe top PCA-based 1D and 6D histograms, 35 × 35Spin Images, and Point Signatures of length 24. Therank histograms contain 20 bins, each representing 50

0

20

40(a) PCA−1D

0

50(b) PCA−6D

0

5

10(c) Spin Image 35×35

0 200 400 600 800 10000

5

10(d) Point Signature 100

Figure 6: Rank histograms

points. The first column therefore, represents Rank50

or the number of scene points (out of 57) for which theirtrue match on the model was ranked between 1 and 50based on their histogram similarity. The second columnrepresents Rank51−100 and so on. We observe that thefirst column contains a significantly larger number ofpoints for the PCA-based descriptors than it does forSpin Images and Point Signatures.

Our implementations of Spin Images and Point Sig-natures were slightly simplified version of the originalmethods. The simplifications in Spin Images includenot compressing images and using histogram intersec-tion rather than the image matching technique pre-sented in [8]. Neither simplification should affect thepoint matching performance of Spin Images since imagecompression is performed solely for the purpose of gain-ing memory efficiency, and the suggested image match-ing in [8] is devised for dealing with outliers and nooutliers exist in our experimental data. We use simplehistogram intersection for comparing Point Signaturesinstead of using the shift and compare strategy devisedin [3]. This affects the matching quality for signatureswith multiple global maxima.

4 Conclusion

We have developed a large set of variable dimen-sional local shape descriptors for 3D point matchingacross range images. The descriptors are based on ninebasic and several extended properties extracted fromthe principal component analysis of each point’s neigh-bourhood. The extended properties are different ex-pressions of the same information as the basic proper-ties. Nevertheless, different representations of the same


8

information could lead to histograms that differ in theirpoint matching capacity and experimenting with vari-ous extended properties is therefore a useful exercise.

Unlike the minimalistic approach of the previouslydeveloped descriptors, our descriptors include variabledimensional histograms of up to 9 dimensions. Previ-ous LSD approaches such as Spin Images obtain viewindependence by exploiting properties such as distancefrom the tangent plane or distance from the normalvector. They form a low dimensional array that car-ries some information about the distribution of theseproperties for some neighbourhood around each point.View independent properties used in these methods area subset of our PCA-based properties. For instance,when a small radius is used for neighbourhood gen-eration, the xy plane approximates the tangent plane(possibly with an offset along the z axis) and the dis-tance from the normal vector and the distance fromthe tangent plane are equivalent to the Za and z posi-tion properties respectively. Therefore, our PCA-baseddescriptors encompass well-known existing descriptorssuch as Spin Images as well as a large variety of noveldescriptors.

We have conjectured that since the existing descrip-tors were selected in an ad hoc manner, they did notnecessarily contain the optimal subset of informationfrom the vast pool of available properties and we couldempirically determine a more optimal subset. Ourexperimental results confirmed that our descriptorsindeed provided more reliable point correspondencesthan Point Signatures and Spin Images.

While our current implementation is in the Mat-lab environment and does not provide meaningful re-sults regarding the processing times, we note that theamount of computation required for constructing thePCA-based descriptors is about the same level as thatof Spin Images or Point Signatures. Furthermore, theincreased reliability of the point matches provided bythe PCA-based descriptors could drastically reduce theRANSAC phase of a pose estimation routine and couldlead to more efficient and reliable pose estimation forvision based tracking.

Our future work involves performing further exper-iments in registering various models with their rangeimages. We also intend to utilize nonparametric sta-tistical methods [5] for efficiently comparing the highdimensional descriptors in order to deal with the curseof dimensionality.

Acknowledgments

The authors would like to thank MDA Space Mis-sions and NSERC for their support for this work.

References

[1] A. Ashbrook, R. B. Fisher, C. Robertson, andN. Werghi. Finding surface correspondence for objectrecognition and registration using pairwise geometrichistograms. Proceedings of the 5th European Confereneon Computer Vision, pages 674–680, 1998.

[2] R. Campbell and P. Flynn. A survey of free-form ob-ject representation and recognition techniques. Com-puter Vision and Image Understanding, pages 166–210, 2001.

[3] C. S. Chua and R. Jarvis. Point signatures: a newrepresentation for 3d object recognition. InternationalJournal of Computer Vision, Vol. 25, No. 1, pages 63–85, 1997.

[4] M. A. Fischler and R. C. Bolles. Random sample con-sensus: A paradigm for model fitting with applicationsto image analysis and automated cartography. Com-munications of the ACM, Vol 24, pages 381–395, 1981.

[5] I. Fraser and M. Greenspan. Color indexing by non-parametric statistics. International Conference on Im-age Analysis and Recognition, 2005.

[6] M. Greenspan, L. Shang, and P. Jasiobedzki. Efficienttracking with the bounded hough transform. Proceed-ings of the IEEE International Conference on Com-puter Vision and Pattern Recognition, pages 520–527,2004.

[7] P. Jasiobedski., M. Greenspan, and G. Roth. Posedetermination and tracking for autonomous satellitecapture. The 6th International Symposium on Arti-ficial Intelligence, Robotics and Automation in Space,June 2001.

[8] A. E. Johnson and M. Hebert. Using spin images for ef-ficient object recognition in cluttered 3d scenes. IEEETransactions on Pattern Analysis and Machine Intel-ligence, Vol. 21, No. 3, pages 433–449, 1999.

[9] Y. Sun, J. A. Koschan, D. L. Page, and M. A. Abidi.Point fingerprints: A new 3-d object representationscheme. IEEE Transactions on Systems, Man, andCybernetics, Part B, Vol. 33, No. 4, pages 712–717,2003.

[10] M. J. Swain and D. H. Ballard. Color indexing. In-ternational Journal of Computer Vision, pages 11–13,1991.

[11] G. Xiao, S. H. Ong, and K. W. C. Foong. Three-dimensional registration and recognition method fortooth crown. International Congress on Biological andMedical Engineering - The Bio-Era: New Frontiers,New Challenges, 2002.

[12] S. M. Yamany and A. A. Farag. Surface signatures: Anorientation independent free-form surface representa-tion scheme for the purpose of objects registration andmatching. IEEE Transactions on Pattern Analysis andMachine Intelligence, Vol. 24, No. 8, pages 1105–1120,2002.


9

Development of a Scaled Ground Testbed for Lidar-based Pose Estimation

Andrew C.M. Allen, Nikolai N. Mak, Christopher S. Langley MDA, 9445 Airport Road, Brampton, Ontario, Canada

{andrew.allen, nikolai.mak, chris.langley}@mdacorporation.com

Abstract

Lidar (Light Detection and Ranging) sensors have been used successfully to provide range and bearing for on-orbit rendezvous of spacecraft from far to medium range. At close range lidar is also capable of producing multiple returns from a single target which permits three-dimensional imaging. Lidar-based pose estimation using sparse 3-D data is a promising enhancement of the sensor for orbital rendezvous. Unfortunately, given the ranges over which a lidar system incorporating pose estimation should be tested, development of a representative ground testbed is not straightforward. A satellite model and laboratory setup have been developed to predict rendezvous lidar sensor performance during initial pose acquisition, approach and capture phases. A technique is presented for geometric and temporal scaling of the experimental setup. Results from the laboratory tests are used to validate a high-fidelity simulation of the rendezvous system. Following simulator validation, analysis and numerical simulation are used to predict the on-orbit performance of the flight system. 1. Introduction

On-orbit satellite servicing is a concept for cost-effective preservation of satellites, whereby a servicer spacecraft docks and performs resupply or maintenance

functions on a second craft. One of the key technical challenges to developing any feasible satellite servicing system is the ability to execute rendezvous and docking manoeuvres either automatically (with robotic navigation and guidance) or with a human pilot. This operation, shown in Figure 1, requires a chaser spacecraft (i.e., the servicer) to successfully approach and form a mechanical interface with a target spacecraft (the satellite to be serviced).

Communication latencies can make real-time ground control of the chaser spacecraft impossible. A level of supervised autonomy is certainly possible, whereby ground controllers monitor the rendezvous status and send high level commands; but high-bandwidth control of the chaser’s trajectory usually must be performed by the on-board avionics.

Lidars have been proposed as a useful sensor for an autonomous rendezvous GN&C system. Lidar has two advantages over camera-based vision systems: they provide 3-D data at both long and short ranges, and are much less sensitive to ambient illumination conditions. Lidars typically fulfil the requirements for range and bearing to a target. At long range, each valid lidar pulse returns the range and bearing of the target. By taking the centroid of the lidar point cloud data, range and bearing can be derived. At medium and close range, a 3-D image of the target can be formed for further estimation of relative attitude and corresponding rates. However, the resolution of the sensor is such that pose estimation

Figure 1 – Terminal rendezvous


10

(i.e., position and orientation) of the target spacecraft can be determined only from the close-range point cloud data. Figure 1 shows the approximate ranges where range/bearing data and pose estimates are required.

In pulsed detection lidar rangefinding systems, short light pulses emitted by a laser source are directed towards a target and the light pulses scattered from the target are detected by an optical receiver. The time-of-flight is then measured by the associated measurement optics and electronics. The main functional blocks of this type of lidar-based rangefinding system are shown in Figure 3. In the Rendezvous Lidar System (RLS), the time-of-flight lidar that MDA has built for use in orbital rendezvous, the laser pulses are steered using a high-precision mirror to permit scanning of a single target; this enables the generation of 3-D range maps of the target vehicle.

Figure 2 – Rendezvous Lidar System (RLS)

Figure 3 – Block diagram of lidar

The mission requirement is to determine initial pose and track the relative pose changes of the target with sufficient accuracy and frequency to allow the chaser GN&C system to perform the rendezvous and docking.

Ground testing of lidar-based pose estimation systems is not straightforward due to issues of geometric and temporal scaling. This paper describes our approach in dealing with these complex issues. Section 2 gives some background information on rendezvous and spaceborne pose estimation. Section 3 outlines our approach to lidar-

based pose estimation and the issues associated with scaled laboratory testing. Experimental results and discussion are given in Section 4, and conclusions follow in Section 5.

2. Background

A survey of the literature on Automatic/Autonomous Rendezvous yields a variety of papers, books, and documents from the Russian, American, European, and Japanese space programs. These include definitive references and compendiums such as Automated Rendezvous and Docking of Spacecraft [3] and An Assessment of the Technology of Automated Rendezvous and Capture in Space [12]. Much of the algorithmic treatment of rendezvous is based upon the seminal work of Clohessy and Wiltshire [2] which relies on the earlier lunar orbit studies of Hill [6]. Space-based pose estimation systems include the Space Vision System (SVS) [10], the Video Guidance Sensor (VGS) [7], and its enhanced version [4]. It is worth noting that these predecessor systems use visual targets to determine pose. The RLS is a targetless system, meaning it can be applied to target spacecraft which have not been designed to cooperate with the sensor.

Table 1 – Target performance measures

Measurement Medium Range (~100 m)

Close Range (~20 m)

Position Accuracy ± 100 cm ± 30 cm

Orientation Accuracy ± 10 º ± 3 º

Update Rate 1 Hz 5 Hz

3. Ground Testbed Development 3.1 Objectives

Our objective is to perform a ground demonstration of lidar-based pose acquisition and tracking for a representative target spacecraft with representative relative motion. The value of a ground test bed is in the lidar sensor performance evaluation and prediction for all phases of autonomous rendezvous and docking. The design case is a rendezvous with the Hubble Space Telescope (HST, Figure 4). The HST relative velocities considered are as high as 8 cm/s with angular rates up to 0.3 º/s. Target performance measures for the system are given in Table 1, and are appropriate for the requirements of a rendezvous mission.

In determining the requirements of the ground testbed, several issues present themselves. Naturally, experiments cannot be performed using the full-size objects (~5 metres) and full-scale distances (~200 metres) at a reasonable cost. The use of scale modelling is appealing because 6 degree-


11

of-freedom (DoF) trajectories can be readily actualized using manipulators with scale models of spacecraft. The question then becomes, how does one apply scaled laboratory results to predict the actual on-orbit performance of a satellite rendezvous system? As the following sections demonstrate, the answer is not as straightforward as one might think.

Figure 4 – HST frame of reference

3.2 Geometric Scaling

Scaling of linear measurements is determined by the scale of the satellite mockup. If the target object is scale 1:k, then all distances under consideration are divided by k as well. For example, in the 1:10 scale setup shown in Figure 7, a desired linear velocity of 40 mm/s is executed as 4 mm/s in the laboratory. Angles are scale-invariant, so orientations are kept the same. Angular velocities are likewise unaffected by geometric scaling.

Many ground tests would perform the previous scaling and stop there. However, the crucial and unavoidable issue is the noise inherent in the sensor. Although the size of the target object is decreased, it is still being imaged by a sensor with the same amount of noise. Put another way, this is equivalent to scanning a full scale object with k times the sensor noise.

Even though the sensor noise cannot be changed, it can at least be quantified. Range and bearing noise can be measured by scanning known targets. The bearing spatial sampling (or image pixelation) and laser spot size effects can be combined into a single metric, called the effective instantaneous field of view (EIFOV) [9]. The EIFOV is derived from an average modulation transfer function (AMTF) which is a function of the spot size and spot spacing of the lidar beam. This function quantifies the spatial frequency response of the lidar, in much the same way that the modulation transfer function describes the spatial response of a camera. Because the spot size typically varies with range, the EIFOV must be calculated based on the actual distance in the lab and then analyzed with respect to the EIFOV in the real scale geometry. Ideally, the EIFOV of the lidar in the lab and in the real range should be equivalent in order to provide the same angular resolution. As can be seen in Figure 5, the

laboratory test with the scaled Ilris is expected to have a much coarser angular resolution than that of the flight sensor.

Figure 5 – AMTF curves for RLS (blue) and scaled

Ilris (red) at 30 metres range

3.3 Temporal Scaling

Unlike stereocameras which acquire a single frame of data via simultaneous measurement of pixels in two camera images, a scanning lidar acquires point data sequentially. Skewing of point cloud data due to object motion during a scan can have a significant effect on the accuracy of the pose estimation technique. Because of this, it is important that the temporal motion of the object be scaled against the scan time of the sensor. The reduced size of the test object changes the frame rate of the sensor, which also causes a time scale variation.

The time scale variation can be induced by the difference in the frame rate between the laboratory sensor and the flight sensor. In the prototype experiments discussed below, a terrestrial Optech Ilris-3D was used in lieu of the RLS optical head; the Ilris is intended for terrestrial metrology and has been designed with a slower scan speed. If the time to acquire a single frame in the lab is m times longer than the expected flight scan time, the relative motion rates (both linear and angular) of the target object must be decreased by a factor of m.

Figure 6 – Side-by-side scaling comparison


12

Figure 7 – Laboratory setup

Top: The Hubble Space Telescope mockup mounted on the “target” manipulator. Bottom: The terrestrial lidar mounted on

the “chaser” manipulator, imaging the Radarsat-2 mockup.

3.4 Testbed Setup

The testbed setup is shown in Figure 7. Scale mockups of satellites can be mounted on the end of an industrial “target” manipulator, so that their motion can be controlled. Sensor packages may be fixed (e.g., on a tripod) or mounted on a second “chaser” manipulator if real-time control is to be tested. In the experiments described below, the lidar is mounted on the chaser, but the target manipulator was able to provide all of the relative motions required.

The mockup currently used is a 1:10 scale model of the HST. The mockup was designed to be a reduced-order representation of the key geometric features of the HST. Complex objects (such as external mechanical interfaces) are modelled with primitive shapes. This not only

simplified the fabrication and verification of the mockup, but allowed the creation of a precisely matching digital model based on the as-built dimensions. Having a precise model is beneficial because we are interested in seeing how well the system works with real-world lidar measurements and real-world processing restrictions, rather than investigating modelling inconsistencies.

Ground truth is established independently using a total station which has been rigidly installed in the laboratory. The lidar sensor is localized by comparing scans of fixed targets with measurements from the total station. The frame of resolution of the target robot (i.e., the frame of reference of the mockup) can be found by measuring known points on the surface of the mockup. Together with the robot’s position telemetry, this measurement can be used to localize the base of the robot in the laboratory frame. Once these three transformations have been established, the ground truth pose of the mockup relative to the sensor can be calculated at any time from the robot’s telemetry, with an order-of-magnitude greater accuracy than the output of the pose estimation system

Pose tracking is performed using our hybrid implementation of the Bounded Hough Transform [5] and Iterative Closest Point [1], running on flight-equivalent avionics.

3.5 Simulation Setup

Figure 8 – Actual scan data vs. simulated scan

data The point cloud on the left was generated in the laboratory;

dropped points are due to high intensity specular reflections. The data on the right was generated synthetically, using the nominal

error values for the Ilris-3D lidar.

Despite having a well-characterized testbed, it is still desirable to have a means of testing the pose estimation algorithm under flight-like circumstances. To that end, we have created a VRML-based simulator which generates synthetic lidar returns (see Figure 8). Havi

arng created a

digital model of the mockup and having ch acterized the EIFOV and range noise in the sensor, it is possible to


13

si motions, and compare the . By using empirical data to ts

sy parameters:

ge and bearing resolution and EIFOV

ise • Angular measurement noise

y

mulate the laboratory conditions, including relative results

validate and tune the simulator, simulated experimencan be performed using the actual flight parameters in order to determine the expected performance of the RLS

stem. Currently our simulator includes the following

• Target model • Target initial position

• Target relative velocity • Field of view • Scan density • Ran

(see Figure 8) • Range measurement no

• Pulse repetition frequenc

Simulation vs. As-Tested Pose - X

200300400500600

ma

Erro

r (m

m)

0100

25 30 35 40 45

Range (scale m)

3Sig

As-Tested Simulation Target

Simulation vs. As-Tested Pose - Y

0100200300400500600

25 30 35 40 45

Range (scale m)

3Sig

ma

Erro

r (m

m)


Simulation vs. As-Tested Pose - Z

0100200300400500600

25 30 35 40 45

Range (scale m)

3Sig

ma

Erro

r (m

m)


Simulation vs. As-Tested Pose - Roll

0

2

4

6

8

10

25 30 35 40 45

Range (scale m)

3Sig

ma

Erro

r (de

g)As-Tested Simulation Target

Simulation vs. As-Tested Pose - Yaw

0

2

4

6

8

10

25 30 35 40 45

Range (scale m)

3Sig

ma

Erro

r (de

g)


Simulation vs. As-Tested Pose - Pitch

0

2

4

6

8

10

25 30 35 40 45

Range (scale m)

3Sig

ma

Erro

r (de

g)


Figure 9 (a-f) – Comparison of pose performance using both real and simulated data


14

4. Results and Discussion

4.1 Laboratory Results and Validation of Simulation

Figure 9 compares the pose estimation accuracy as tested in the lab with that derived for the same parameters in simulation. Experiments were performed with the target simultaneously translating and rotating. The scaled range (28 to 41 metres) is limited by the range of motion of the manipulator.

Results are stated in the model frame of reference. Dotted lines show the required performance from Table 1.

The results are quite similar between the lab test and the simulation, with the possible exception of X and roll. The most difficult degree of freedom to localize is roll, which corresponds to rotation about the long axis of the satellite (see Figure 4). Based on these results, the simulator has been validated for predicting the performance of the flight system under similar conditions.

4.2 Flight Simulation Results

Figure 10 shows the expected flight performance based on simulated data. Results are stated in the sensor frame; the HST roll axis was aligned with the yaw axis of the lidar. The advantage of the simulation is clearly demonstrated by the extent of ranges which can be tested. These results imply that the RLS system will be capable of

[12] Polites ME, An Assessment of the Technology of Automated Rendezvous and Capture in Space, Marshall Space Flight Center: NASA/TP-1998-208528 (1998)

Three-Sigma Error vs Range - Position

0

100

200

300

400

500

600

700

0 50 100 150

3Sig

ma

Erro

r (m

m)

Range (m)

X

Y

Z

Target

Three-Sigma Error vs Range - Orientation

6789

10

Erro

r (de

g)

012

0 50 100 150

Range (m)

345

Sigm

3a

Roll

Pitch

Yaw

Target

Figure 10 – Expected flight performance

producing pose estimates within the required accuracy; this information would not be attainable from ground tests without the scaling approach outlined in Section 3.

5. Conclusions

The detailed scaling approach taken herein allows us to perform scaled lidar-based pose estimation experiments using representative targets and relative motions. These results have been used to validate a simulation of the flight system and make predictions of its performance under circumstances which cannot be tested on the ground.

References

[1] Besl PJ, McKay ND, “A Method for Registration of 3-D Shapes”, IEEE PAMI, 14(2): 239-256 (1992)

[2] Clohessy WH, Wiltshire RS, “Terminal Guidance System for Satellite Rendezvous”, J. Aerosp. Sci., 27(5): 653-658 (1960)

[3] Fehse W, Automated Rendezvous And Docking Of Spacecraft, Cambridge: Cambridge UP (2003)

[4] Granade SR, “Advanced Video Guidance Sensor and Next-Generation Autonomous Docking Sensors”, Proc. SPIE, 5418: 38-49 (2004)

[5] Greenspan M, Shang L, Jasiobedzki P, “Efficient Tracking with the Bounded Hough Transform”, Proc. IEEE CVPR, 1: I520-I527 (2004)

[6] Hill GW, “Researches in Lunar Theory”, Amer. J. Math., 1:5-26 (1878)

[7] Howard RT, Bryan TC, Book ML, “Video Guidance Sensor Flight Experiment Results”, Proc. SPIE, 3380: 315-326 (1998)

[8] Jasiobedzki P, Greenspan M, Roth G, Ng HK, Witcomb N, “Video-based System for Satellite Proximity Operations”, Proc. ASTRA (2002)

[9] Lichti DD, “New Angular Resolution Measure for 3-D Laser Scanners”, SPIE Optical Engineering, 43(10): 2218-2219 (2004)

[10] MacLean SG, Pinkney HFL, “Machine Vision in Space”, Canadian Aeronaut. Space J., 39(2): 63-77 (1993)

[11] Martin E, Maharaj D, Richards R, Tripp JW, Bolger J, King D, “RELAVIS: The Development of a 4D Laser Vision System for Spacecraft Rendezvous and Docking Operations”, Proc. SPIE, 5418: 69-80 (2004)


15

Scalable real-time vision-based SLAM for planetary rovers

Robert Sim, Matt Griffin, Alex Shyr, and James J. LittleLaboratory for Computational Intelligence

University of British ColumbiaVancouver, BC, V6T 1Z4, Canada{simra,mgriffin,shyr,little}@cs.ubc.ca

Abstract

Simultaneous localization and mapping is an importantproblem for autonomous planetary rovers and other spacevehicles. While many authors have addressed the SLAMproblem, few have done so in the context of producing large-scale maps in real time using vision. This paper is con-cerned primarily with the issues presented by the large num-bers of candidate features obtained from vision sensors, andthe implications for data association. We present a Rao-Blackwellised particle filter (RBPF) SLAM implementationthat employs a stereo camera and the SIFT feature detector,and demonstrate that we can build maps of unknown en-vironments with an extensive number of visual landmarks.

1 Introduction

Autonomous planetary rovers that rely on vision sensingrequire the ability to construct dense visual representationsof their environment for the purposes of both navigation anddata collection. A central problem to constructing these rep-resentations is that as a mobile rover explores it accumulateserror in its pose estimate, and subsequently its acquired mapbecomes inaccurate. This problem, generally referred toas simultaneous localization and mapping, or SLAM, hasbeen widely studied and a variety of solutions have beenproposed (for example [3,5,8,9]). However, there are a lim-ited number of vision-based solutions that address real-timemapping, and representations that can scale up to thousandsof mapped features. This paper presents an approach andexperimental results for achieving SLAM solutions in real-time over long trajectories (73m or more), resulting in mapsconsisting of many thousands of landmarks.

Our approach to solving the SLAM problem with a vi-sion sensor is to combine a Rao-Blackwellised particle filter(RBPF)-based approach to mapping [8], coupled with effi-cient data structures developed by Montemerlo, et. al. for

Figure 1. Left: A rendering of the map from asensor’s-eye view. Right: an image of the labfrom a nearby position.

representing a distribution over maps (referred to as Fast-SLAM [7]), and fast data association techniques for match-ing the relatively unambiguous feature descriptors obtainedusing the SIFT feature detector [6].

RBPF-based SLAM solutions operate by maintainingmultiple map hypotheses, each associated with a stochas-tically sampled trajectory through the environment. Thecomplete set of sampled trajectories and inferred maps ap-proximates the probability distribution of maps conditionedon the vehicle’s actions and observations, p(M |A,Z),where M = {m1,m2, . . . ,mn} is the set of maps, eachconsisting of a set of probability distributions describinglandmark positions, A = {u1, u2, . . . , um} are the con-trol inputs to the vehicle (that is, the vehicle’s actions), andZ = {z1, z2, . . . , zm} are the vehicle’s observations of theworld (for brevity, we assume actions and observations areinterleaved). One of the important contributions of the Fast-SLAM algorithm is the data structure it employs to shareinformation between trajectory samples with common his-tory. This facilitates real-time performance of the algorithmas the trajectory length grows.

As an exploratory vehicle moves through the environ-ment, the number of landmarks in its map can grow to num-ber in the hundreds of thousands. This is especially true forvision-based mapping, where feature detectors might typi-


16

cally return 500 feature observations in a single image. Thisposes a difficult problem for solving the data associationproblem, where a single feature observation might requirecomparison with all of the landmarks in the map. Suchan extensive comparison might be required when extractedfeatures are generic, without any uniquely defining charac-teristics (such as those typically employed in mapping al-gorithms employing laser range sensors). Furthermore, thecomputed data association is rarely unique, and often highlyambiguous. While FastSLAM allows for multiple data as-sociation hypotheses, these can reduce the robustness of theparticle filter and potentially lead to sample starvation.

In vision, however, there is usually a great deal of con-textual information associated with a feature that can con-strain data association, and reduce the cost of matching. Inour work, we employ the SIFT feature descriptor, whichprovides descriptions of feature observations that have beenshown to be very robust for feature correspondence. In ad-dition, we apply a kd-tree over the space of SIFT featuresto facilitate approximate nearest-neighbor lookups in timelogarithmic in the number of visually distinct landmarks.

The main contributions of this paper are two-fold. First,we present an implementation of FastSLAM which is basedon vision-based sensing, rather than traditional range sens-ing with a laser. Second, we present methods for perform-ing rapid data association of hundreds of landmark obser-vations in a single image against a database of tens of thou-sands of mapped landmarks. These results leverage thestrengths of particle filter-based approaches for uncertaintyestimation (such as the possibility of multi-modal and non-Gaussian estimates), with data association techniques thatwere previously only applied to Kalman-filter based estima-tors (for example, [9]). Furthermore, where previous im-plementations of the FastSLAM algorithm have generallyemployed sensors with a wide field of view, our experimen-tation demonstrates the performance of the algorithm usingsensors with a comparatively narrow field of view. Finally,we demonstrate experimentally that robust SLAM solutionscan be achieved in real-time over long trajectories (morethan 70m).

The remainder of this paper is structured to providea coverage of the strengths and weaknesses of currentmethods, elaborate on the details of our implementation,present and discuss experimental results, and finally discussplanned improvements.

2 Related Work

There is a significant body of literature on SLAM usingthe Extended Kalman Filter and its inverse, the ExtendedInformation Filter [3, 5, 12]. These approaches model theposterior distribution over maps as a unimodal Gaussiandistribution. Of particular interest is the view based ap-

proach of Eustice, et. al. [3], which enables constant-timefilter updating without significant sparsification approxima-tions. However, a significant difficulty with a view-basedapproach is that the resulting map does not lend itself wellto evaluation or human inspection, a strong prerequisite foran exploratory vehicle.

Two map representations are popular in the literature,landmark based [7, 9, 11] and occupancy grid based [2, 4].Occupancy grids are effective for dense but ambiguous in-formation while landmarks are more suited to sparse butdistinguishable features. Very impressive occupancy gridshave been produced online by recent scan matching tech-niques which also use particle filters for pose estimation[4] [2]. Landmark and vision-based approaches have alsoperformed well in the past, as in [9]. In the latter case, areasonably small environment was successfully mapped byusing a Kalman Filter and assuming independence betweenlandmark and pose estimates. For large environments, thisapproach is likely to be overconfident and lead to filter di-vergence.

In related work, we have also applied our approach tomapping where control and odometry information is un-known. In Sim, et. al., we demonstrated an approach tosolving the SLAM problem using the approach outlinedhere, coupled with a visual odometry estimate for motionestimation [10].

3 Simultaneous Localization and Mapping

This paper represents map estimation as the evolutionof a Rao-Blackwellised particle filter [8]. In this context,the trajectory and landmark distribution is modeled as a dy-namic Bayes network, where trajectories are instantiated assamples, and the landmark distribution can be expressed an-alytically for each trajectory. At time t, let st denote the ve-hicle pose, mt the map learned thus far and xt = {st,mt}be the complete state. Also, let ut denote a control signalor a measurement of the vehicle’s motion from time t−1 totime t and zt be the current observation. The set of obser-vations and controls from time 0 to t are denoted as zt andut respectively. Our goal is to estimate the density

p(st,mt|zt, ut) = p(xt|zt, ut) (1)

It has been demonstrated elsewhere that p(st,mt|zt, ut)can be approximated by factoring the distribution in termsof sampled trajectories st, and independent landmark distri-butions conditioned on the sampled trajectories [8]:

p(st,mt|zt, ut) ≈ p(st|zt, ut)∏

k

p(m(k)|st, zt, ut) (2)

wherem(k) denotes the k−th landmark in the map. That is,we instantiate a set of samples st, propagate them accordingto ut, and construct maps for each according to zt.


17

A simplistic approach to running an RBPF for SLAMwould yield an update complexity of O(MN), where M isthe number of particles at each step and N is the number oflandmarks. However, Montemerlo et al. introduced in theirFastSLAM work a tree-based structure which refines thiscomplexity to O(M logN) by sharing landmark estimatesbetween samples [7]. Each sample in the filter will share un-altered landmark estimates with its ancestor particles (thoselandmarks that have not been observed from the time of theancestor to the present). Each landmark observation resultsin a landmark being copied from its parent and updated butthe rest of the tree remains the same.

3.1 Data Association

In an unmodified natural environment, landmarks aredifficult to uniquely identify. This problem is known asdata association or correspondence and incorrectly match-ing observations to landmarks can lead to inconsistencies ina map. Stereo vision can quickly provide 3D informationand when coupled with a scale-invariant feature transform(SIFT) detector [6] it can provide distinct landmarks. SIFTfeatures are desirable as landmarks because they are some-what invariant to image scale, rotation and translation aswell as to illumination changes and affine or 3D projection.This combination can result in many viable landmarks froman unaltered environment.

4 Implementation

4.1 State Representation

We describe samples of the vehicle’s pose with the vectorst = [x, y, θ], situated in a plane 1.

At each time step, theN pose samples are propagated ac-cording to the motion model p(st|st−1, ut), which is user-defined. Over time the distribution of samples can becomenon-Gaussian, and even multi-modal. The noise model weapply is designed to take a conservative approach to esti-mating the possible drift in the robot’s pose over time, whilekeeping in mind that noisier models require more particlesto prevent starvation as the underlying pose distribution dis-perses.

The specific action sequence is dependent on the robot’sexploration policy. For this paper, we drive the robot byhand, and infer the robot’s actions from odometry measure-ments between observations. For each action the filter pro-duces a new generation of particles that inherit propertiesfrom the previous generation. The set of hypothetical tra-jectories executed by the robot is represented by this tree

1Our current work is aimed at extending our results to a full 6-DOFmotion model.

of particles. After taking an observation (described in thenext section), each particle in the current generation of par-ticles is weighted according to the probability of the currentobservation zt, conditioned on that particle’s trajectory:

wi = p(zt|si,t,mi,t) (3)= k exp(−0.5∆zTΣ−1∆z) (4)

where ∆z = h(si,t) − zt, h(·) is a generative model ofthe observations as a function of pose, Σ is the sum of theobservation covariance and observed landmark covariance.The particle is weighted by how well the current observa-tion is consistent with the map constructed from that parti-cle’s trajectory. The weights are subsequently normalizedacross the population of samples, and then sampled proba-bilistically with replacement to produce the next generationof particles. Should any particle not be chosen for advance-ment it is pruned and all particles with no children are thenrecursively removed from the tree. If an insufficient num-ber of particles are used, or resampling takes place too fre-quently, this can lead to starvation as hypotheses are pruned.

4.2 Sensing and Data Association

SIFT ID’s Landmark EstimatesParticle

Figure 2. Each particle has an associatedmap, organized by SIFT descriptor. Simi-larly, each SIFT descriptor might have mul-tiple landmark estimates, each spatially dis-tinct.

We employ a data structure similar to that describedin [7] as a map representation. Each particle has associatedwith it a set of landmark estimates, described by Gaussiandistributions. However, in the vision-based case, we takeadvantage of the descriptive power of the SIFT transform(described below), enabling us improve the quality of dataassociation. In this formulation, each particle maintains alist of SIFT IDs, and these IDs in turn point to a linked listof one or more 3D landmark estimates (Figure 2). Note thatone SIFT ID can point to multiple landmarks– landmarksthat have similar appearance but are spatially distinct.


18

We are using a Point Grey Research BumbleBee stereocamera for our primary sensor, and extract SIFT featuresusing a difference of Gaussian detector [6]. The features’partial invariance to image scale, rotation, translation and3D or affine projection are what make them desirable land-marks. Each SIFT feature has a 36 dimension identifier, orkey, associated with it and this matching is based on findinga suitably distinct match. We perform a linear search of thekeys in the left image for each key in the right. The twokeys with the smallest Euclidean distances from our targetkey are found and if the ratio of best and second best dis-tances is below a set threshold (currently 0.6) it is consid-ered a good match. That is, for keys kl1,, kl2 and kr, in theleft and right images, according to subscripts, a successfulmatch of kl1 to kr satisfies the property

kl1 − krkl2 − kr

< 0.6 (5)

Once a 3D feature is extracted from the stereo pair, wedetermine if this feature corresponds to one we have seenbefore. Our approach to data association is depicted in Fig-ure 3. To efficiently store and access what can quickly be-come a large number of keys we use a kd-tree. The kd-treefacilitates nearest-neighbor matching in time logarithmic inthe size of the tree, and has been demonstrated to be reliablefor object recognition tasks [1]. The disadvantage of usinga kd-tree is that it can sometimes produce not the nearestmatch but a close match. We maintain a single tree for thesensor and associate an arbitrary integer ID with each SIFTidentifier we add. New keys are considered to be candidatekeys and are not passed as an observation to the particle fil-ter until they have been observed for a sufficient number offrames. Since we do not currently use negative informa-tion to remove erroneous landmarks from the maps this isan effort to limit their number.

Each particle’s map is indexed by a set of IDs associatedwith SIFT descriptors and each node contains a linked listof landmarks sharing that descriptor. Multiple data associ-ations can be entertained by the filter because each particledetermines the specific landmark to which an observationcorresponds. The number of landmarks associated with anID is typically quite small as shown by Table 1. A par-ticle’s weight is updated for a given landmark observationaccording to Equation 4 by first selecting from the linkedlist for the matched landmark ID the landmark estimate thatis closest to the observed point in the global frame of refer-ence. The maximum distance threshold for this comparisonis based on an approximation of the camera’s error and ifmultiple landmarks fall within this range the closest is cho-sen. Clearly since the filter is initiated without a map anyobservation with an unknown ID or a 3D position whichdoes not match is treated as a new landmark.

In the following section we describe our experimentalresults.

5 Experimental Results

For the purposes of our experiments, we used an RWIB14 robot with a BumbleBee stereo head from Point GreyResearch. The robot was driven through a laboratory envi-ronment, and the robot collected 5000 images along a tra-jectory of approximately 74m. We ran the system using100 particles, which enabled a frame rate of approximately2.1Hz (Figure 5). Table 1 describes at approximately 1000time-step intervals the average number of landmarks asso-ciated with each map, the total distance traveled, the totalnumber of SIFT id’s, the time step in history at which thefilter converges to a single map, and the total number oflandmark instances in the system (these can outnumber theproduct of samples and mean landmarks as many instancesare not promoted to full landmarks until they have been ob-served at least 3 times).

Table 1. Map summary (see text for details).

Time Avg.LMsper

Samp.

Dist.trav-eled(m)

TrackedSiftFea-tures

Filterconver-gence

TotalLMs

5043 31577 72.64 29462 4847 108636

4009 25300 59.66 23766 4000 84146

2987 18830 42.02 17826 2889 70369

2043 12840 27.05 12338 1989 60871

1021 6083 14.04 5922 919 23688

Figure 4 depicts the map constructed for the maximum-likelihood particle at the end of exploration, beside a mapcomputed using dead reckoning alone. The filter-based mapis clearly more accurate in that it correctly captures the rec-tilinear alignment of the three rooms traversed by the robot.

6 Discussion

Among the key observations from our experiments isthat we are able to successfully map a large environmentin real-time. At the end of map construction, we are match-ing 29,462 SIFT features, and each map consists of morethan 31,000 landmarks, with a total of only 109,000 land-marks shared across all the maps. As the maps grow in


19

are passed to the individualparticles

The sensor measurements

4

4

4

4

4

4

The conditioned observations are thencompared against the existing entries in the particle’s map with the same ID (light−ly shaded circles). The landmark entrys associated uncertainty (illustrated as an el−lipse) is used to determine the likelihoodof a match.

The robot takes a seriesof relative measurements

for each in the KD−treeand finds a matching I.D.

The quality of the data association is assessed by the weighting stepof the particle filter. Particles that have low weights will tend to be pruned.

observations to existing entries in itsmap. The matches will increase theparticle’s weight by an amount deter−mined by the quality of each match.

Particle 1 will match three of the four

the observations. By creating three new entries this particle will likely beweighted much lower than the one above.

Particle 2 will match only one of

In this example, ID 8 is a new ID andso unknown by all particles. It willautomatically be initialized as a new entry. However, ID 4 has three prev−ious entries associated with it in bothmaps (other particles might have moreor fewer than three entries).

Particle 2

4

4

4

4

4

4

Particle 1

WeightMatchObserve

4

4 8

4 4

4 8

4

4

4 8

4

4

4 8

4

4

4 8

4

Figure 3. SIFT-based data association.

Figure 4. Left: The constructed map for the best sample at the end of exploration. Yellow: maximumweight trajectory. Pink: Dead reckoning trajectory. Grid lines indicate 25cm intervals. Right: theconstructed map using dead reckoning.


20

0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000

500

1000

1500

2000

2500

3000

3500

4000Processing time per frame

Frame

Tim

e (m

s)

Figure 5. Per-frame computation time in mil-liseconds. The mean frame rate is 2.1Hz on a3.2GHz Intel Xeon Processor

size, only a small slow-down in computation time is exhib-ited. More importantly, relative to dead-reckoning, we havedemonstrated that our resulting map is more accurate. Thiscan be observed in the fact that the three rooms observedby the robot are correctly aligned, as well as the fact thatwhen the robot returns to the first room it correctly locatesthe position of the door into the third room (near the centerof the constructed map).

7 Conclusion

We have presented an implementation of an RBPF-basedSLAM algorithm using a vision-based sensor. The key goalof our work is to facilitate scalable maps that incorporatelarge numbers of visual landmarks (on the order of hundredsof thousands). The primary contributions of this work arethe facilitation of vision-based sensing in association witha particle filter that supports multiple data associations. Wehave also experimentally demonstrated the success of thesystem on a real robot.

Among the goals of our work are the problem of con-structing visual representations of very large environments(on the order of 100m in diameter), and in particular in out-door environments. We also hope to extend this work to full6-DOF representations, which will be better suited to spacevehicles and exploration over rough terrain. We believe thatthe data structures developed in this work can facilitate theconstruction of maps of this size, and our ongoing goal is toaccomplish this task experimentally. One outstanding ques-tion for our work is how to reliably cull SIFT features thatare rarely observed. This features tend to clutter the kd-treeand deletion requires costly re-balancing in the tree. Anoutstanding question is whether loop closure can be suc-cessful in very large environments, and how many particlesmight be required to successfully close the loop. Finally,we are interested in developing approaches to autonomousexploration based on particle-based representations of the

environment.

Acknowledgements

The authors wish to thank Pantelis Elinas for providingassistance with key matching and stereo correspondence.This work was supported in part by the Canadian SpaceAgency/NSERC Postdoctoral Fellowship, and NSERC un-dergraduate summer research assistantships.

References

[1] J. S. Beis and D. G. Lowe. Shape indexing using approx-imate nearest-neighbour search in high-dimensional spaces.In Proc. IEEE Conf. on Computer Vision and Pattern Recog-nition, pages 1000–1006, Peurto Rico, June 1997. IEEE,IEEE Press.

[2] A. I. Eliazar and R. Parr. DP-slam 2.0. In Proc. ICRA 2004,New Orleans, LA, 2004. IEEE Press.

[3] R. Eustice, H. Singh, and J. Leonard. Exactly sparse delayed-state filters. In Proc. of the 2005 IEEE Int. Conf. on Roboticsand Automation, pages 2428–2435, Barcelona, Spain, April2005.

[4] D. Hahnel, D. Fox, W. Burgard, and S. Thrun. A highlyefficient FastSLAM algorithm for generating cyclic mapsof large-scale environments from raw laser range measure-ments. In Proc. Conf. Intelligent Robots and Systems (IROS),2003.

[5] J. J. Leonard and H. F. Durrant-Whyte. Simultaneous mapbuilding and localization for an autonomous mobile robot. InProc. of the IEEE Int. Workshop on Intelligent Robots andSystems, pages 1442–1447, Osaka, Japan, November 1991.

[6] D. G. Lowe. Object recognition from local scale-invariantfeatures. In Proceedings of the Int. Conf. on Computer Vision,pages 1150–1157, Corfu, Greece, September 1999. IEEEPress.

[7] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit. Fast-SLAM: A factored solution to the simultaneous localizationand mapping problem. In Proceedings of the AAAI NationalConf. on Artificial Intelligence, Edmonton, Canada, 2002.AAAI.

[8] K. Murphy. Bayesian map learning in dynamic environments.In 1999 Neural Information Processing Systems, 1999.

[9] S. Se, D. G. Lowe, and J. Little. Mobile robot localization andmapping with uncertainty using scale-invariant visual land-marks. Int. J. Robotics Research, 21(8):735–758, 2002.

[10] R. Sim, P. Elinas, M. Griffin, and J. J. Little. Vision-basedSLAM using the Rao-Blackwellised particle filter. In Pro-ceedings of IJCAI Workshop on Reasoning with Uncertaintyin Robotics, Edinburgh, Scotland, 2005.

[11] R. Smith, M. Self, and P. Cheeseman. Estimating Uncer-tain Spatial Relationships in Robotics. In Autonomous RobotVehicles, I.J. Cox and G.T. Wilfong, eds. 167–193, Springer-Verlag. 1990.

[12] S. Thrun, Y. Liu, D. Koller, A. Ng, Z. Ghahramani, andH. Durrant-Whyte. Simultaneous localization and mappingwith sparse extended information filters. Int. J. Robotics Re-search, 23(7-8):693–716, 2004.


21

Estimation of the aerodynamical parameters of anexperimental airship

Diego Patino1,2, Leonardo Solaque1,21LAAS - CNRS

7 Avenue du Colonel Roche31077, Toulouse Cedex 4, France

Email: [email protected], [email protected]

Simon Lacroix1, Alain Gauthier22Andes University. Mathematics and Electronic Department

Carrera 1a No. 18A-10Bogota, Colombia

Email: [email protected], [email protected]

Abstract— In this article we show the results of a method toidentify the aerodynamic parameters of the model of a blimp incruise flight conditions. The proposed model can be used lateron for the design of the control laws. The approach consistsin analysing various flights data: it relies on the use of Kalmanfiltering methods to minimize a cost function in a computationallyefficient way. The performance of the approach is remarkable,as shown by a comparison with results obtained in wind tunnelexperiments. It can therefore provide a quite precise estimation ofthe dynamics of a blimp, including its aerodynamic parameters,avoiding costs since we do not use wind tunnel measurements.

Index terms—airship, identification, Kalman filter, space statemodel.

I. I NTRODUCTION

In this work we will present the parameters identification ofa dynamic model for the robot KARMA [6], [9]. Its flight isstrongly affected by nonlinear and meteorological phenomenalike wind currents and sudden pressure changes. This complexsituation is a good test to our method and it is also a goodsource of interesting applications.

Usually, scientists describe the behavior of this kind ofaircrafts by doing thorough measurements of the dynamicsof a scale model in the wind tunnel [10]. The methods andtechniques proposed in this work will provide us with a betterdescription of the real system. Indeed, we present a generalcomputational scheme to simulate the flight of the dirigibleunder different atmospheric conditions.

When we face the problem of describing non linear dy-namical systems, we can use a technique based on parameterestimation. It consists in determining the coefficients of thesystem by fitting time series data obtained from measurementsof the dynamical parameters of the body. Usually, this is doneat many time periods all along several real performances.This methodology is particularly useful for the analysis of thedynamics of non linear systems. Furthermore, it also allowsus to design proper control laws according to the operationalaims of the robot [4].

On the other hand, the performance and reliability of themodel are strongly based on the quality of the availableobservations from the real robot and its scale models. Wemust face here the particular problem of data quality and thepresence of noise. Therefore, the used method is well adaptedto accomplish a particular hard task: parameter estimationfrom noisy and corrupted signals.

By doing a good computational implementation of theparameter estimation, we can simulate several flight conditionsthat can be used as a substitute for the standard wind tunnelmeasurements. The most important aim of this work is todetermine the dynamical parameters of the robot.

To describe the dynamics of an airship as KARMA, wemust measure the following variables1:

• The Roll (φ), angle of the rotation aroundX axis,• The Pitch (θ), angle of the rotation aroundY axis,• The Yaw(ψ), angle of the rotation aroundZ axis,• The position in the 3D space (−→η1 = [x y z]T ),• The velocity in the space 3D (

−→V = [vx vy vz]T ).

All this data is registered with respect to the reference frameENU (East, North, Up). After the signals have been registered,we focus on the data quality problem because all real sensorshave a limited accuracy which makes our task harder. Manyalgorithms for parameter estimation of dynamical systemshave been proposed recently [2], [3], [4]. However, most ofthem are applicable only to limited cases such as models withweak non linearities. On the other hand, previous proposalscannot deal with the problem of noise. Only a few of themwork well under reduced amounts of noise.

Because we must pay special attention to the suppressionof noise and the non linear nature of the system, we proposeto use the Kalman filter realizations. This filtering procedureis based on the minimization of the quadratic error betweenthe model and the observed data. Cf. [7]. Among the differentpossibilities offered by Kalman filtering procedures, we haveselected two: the Classical Extended Kalman filter (EKF)and the Unscented Kalman Filter (UKF). The first one wasproposed in [8], [12]. It is based on the linearization of oneparticular trajectory. The second one has a better performanceeven under chaotic regimes [1], [11]. As we explained before,the estimation of aerodynamic parameters of meteorologicaldirigibles is an interesting source of applications for themethods and procedures proposed in this work. Moreover, thetesting on scale models does not provide us with satisfactoryanswers about the dynamic behavior of the real system. Thisis owed mostly to the huge amounts of data that practitionersmust store and analyze. In this work we will show how to

1These quantities are measured by using an electronic compass and Difer-encial Global Position System (DGPS)


22

apply Kalman filtering to do realistic and cheap computationalsimulations of the dynamical behavior of blimps used insurveying.

The present paper is organized as follows. In Section 2,we show a short introduction to the mathematical modellingof the dynamics of the dirigible by using the physical laws ofdynamical systems in a general way. In Section 3 we introducethe essentials of Kalman filtering in signal processing. Section4 is devoted to the applications of the methods that we havedevised. We also describe the dynamics of particular airship:KARMA. In Section 5 we give some concluding remarks andpresent our future research work.

II. DYNAMICAL MODEL

In this section, we present the dynamical model of theairship in a brief way. The equations are quite complex andsince it is not the main topic of this paper, we will not stresshere. For detailed information Cf. [10], [9],[6].

To describe the dynamic of the blimp, we have to considerthe following assumptions:

• The dirigible is a solid body,• The total mass of the dirigible is constant.• The center of the Archimedes force is exactly the same

as the volumetric center (C) of the dirigible envelope.along with the Newton’s classical laws of mechanics and theaerodynamic. Next, we define three reference frames:R0 asthe global frame fixed to the earth,Rd as the local framefixed to C andRa as the aeronautic frame fixed toC. Thiswill make easier the description of the navigation of the blimp.The Figure 2 shows the chosen reference frames.

Fig. 1ROBOT KARMA

Fig. 2THE DIFFERENT FRAMES

Now, the dynamical model of our blimp is given by:

Md−→ν = −Td(−→ν )− Ta(−→ν )− g(−→η ) + Tp(Fm, µ) (1)

where:• −→ν = [−→ν1,−→ν2]T = [u, v, w, p, q, r]T is the state vector

which is composed of the translation and rotation veloc-ities in the local frameRd.

• −→η = [−→η1,−→η2]T = [x, y, z, φ, θ, ψ] is the state vector ofthe position and angles in the global frameR0.

• Md is the matrix composed of masses and momentums.• vector Td(−→ν ) is composed by the Coriollis force and

centrifugal force.• g(−→η ) contains the Archimedes force and the body

weight.• Tp(Fm, µ) is the propulsion force vector• Ta(−→ν ) represents the aerodynamical vector that contains

the aerodynamical forces.The vectorTa(−→ν ) is formulated as:

Ta(−→ν ) = A−→ν +D1(−→ν2)ν + Tsta (2)

A is the matrix of jointed masses,D1(ν2) represents thephenomena of “translation-rotation” and “rotation-rotation” bythe Bryson’s theory [14] andTsta is the stationary effort andmoments inC of the dirigible.

The control variables of the airship are: the vectorizationangle of the motorsµ, the elevation angleδe, the angle forrotationsδg and the torque generated by the motorsFm.

We clarify that the measures of the sensors are in theframeR0. Thus we have to consider the transformation matrixJ1(−→η2):cosψ cos θ − sinψ cosφ+sinφ cosψ sin θ sinφ sinψ+sin θ cosψ cosφ

cos θ sinψ cosψ cosφ+sin θ sinψ sinφ − cosψ cosφ+cosφ sin θ sinψ

− sin θ cos θ sinφ cos θ cosφ

(3)

andJ2(−→η2): 1 sinφ tan θ cosφ tan θ

0 cosφ − sinφ

0 sin φcos θ

cos φcos θ

(4)

between the local frameR and the global frameR0:

−→η =[J1(−→η2) 03×3

03×3 J2(−→η2)

]ν (5)

Usually, the expression ofTsta is cumbersome [10]. Afterseveral assumptions about the system, we can simplifyTstaand still obtain a general representation2 which includes theaerodynamic phenomena. This is:

Tsta =

K1(cT1 + cT2α+ cT3δe)K1(cL1 + cL2β + cL3δg)K1(cN1 + cN2α+ cN3β)K2(clC1 + clC2α+ clC3δe)

K2(cmC1 + cmC2β + cmC3δg)K2(cnC1 + cnC2α+ cnC3β)

(6)

Where K1 and K2 are constants depending on the ge-ometry of the dirigible,CT1 . . . CT3 are the coefficients ofthe tangential effort,CL1 . . . CL3 are the coefficients of thelateral effort, CN1 . . . CN3 are the coefficients of the nor-mal effort, ClC1 . . . ClC3 are the roll coefficients respect toC, CmC1 . . . CmC3 are the pitch coefficient respect to C,ClC1 . . . ClC3 are the yaw coefficient respect to C,α =f1(u, v, w) is the angle of incidence andβ = f2(u, v, w) isthe angle of slip.

2This approach is used in the airships control


23

TABLE INUMBER OF PARAMETERS TO ESTIMATE

Matrix A D1 CT CL CN ClN CmN CnN

Parameters 11 7 3 3 3 3 3 3

The representation (6) provides a simpler model of thedynamics of the airship in cruise flight. It means that theaerodynamical velocity and the height of the airship areconstant. It also simplifies the formulation of the control laws.

Our main aim is to estimate all the unknown parameters.Table I shows the36 parameters required. We stress that allof them affect the aerodynamic behavior of the dirigible. Torefer to the set of parameters we use the vectorial notation

−→λ .

III. D ESCRIPTION OF THEKALMAN FILTER

The problem of filtering in statistics consists in determiningthe probability density. We analyze the non-linear discrete casewith mutually uncorrelated Gaussian noisesε ∼ Nk(0, σε) andν ∼ Nk(0, σν) up to timetf .

Xk+1 = f(Xk, uk) + εk

Yk = h(Xk) + νk(7)

Where X is the state variable defined inRn. All signalsare sampled by using the same time intervalk∆T . Cf. [12],[16],[17].

Kalman filtering consists in doing one prediction and onecorrection at every time step. The prediction estimates themean of the filter density, and makes the state estimationwith the formula x(k|k − 1). The correction uses a prioriinformation as of the measureY = {Y1, Y2, . . . , Yk−1} foradjusting the prediction step. C.f. [7], [8].

The important task here is to find the Kalman gain matrixKk of the model:

xk|k = xk|k−1 +Kk(yk − y(k|k − 1)) (8)

This can be done with the Extended Kalman Filter (EKF) [4]or with the Unscented Kalman Filter (UKF). Although thesetechniques are based in the same philosophy, they differ in theway that they compute the Kalman gain matrix.

A. Extended Kalman Filter (EKF):

This method uses the stochastic lineal system:

Xk+1 = AkXk +Bkuk + εk

Yk = CkXk + νk(9)

and it estimates the mean and the variance in the followingway:

x(k|k − 1) = Akf( ˆx(k − 1|k − 1)) +Bkuk (10)

We highlight the fact that the control input (uk) is deter-ministic. After solving one particular optimization problem, asit has been explained in [7], we obtain the covariance matrixequation for the associated Kalman filter:

P (k|k − 1) = Ak−1P (k − 1|k − 1)ATk−1 + σε (11)

P (k|k) = (I −KkCk)P (k|k − 1) (12)

Eq. (11) is called the prediction equation of the covariance.We must compute this equation at first. Eq. (12) is calledthe correction eq. of the covariance. We compute it after wehave solved the prediction equation. Finally, we can write theKalman gain as:

Kk = P (k|k − 1)CTk (CkP (k|k − 1)CTk + σν) (13)

In this way, the algorithm that we propose implements theequations (11)-(13)-(12)-(8) in the listed order.

For the non linear, case we approximate the nonlinear func-tionsf andh (7) by their first order Taylor series expansions.This approximation implies that we have to compute theJacobian of the non-linear functions and to evaluate them inthe sampling procedure explained above. C.f. [13].

B. Unscented Kalman Filter (UKF):

There are many ways to deal with non linear state spacemodels, proposed by other authors. C.f. [1], [5], [11]. This kindof procedures belongs to the class of statistical linealizationschemes in which densities are truncated instead of the modelsf and h. They use a sample setχ2n

0 , called sigmA points,which maintains the same mean and covariance while it ispropagated through the non linear state space model:

χ0(k − 1|k − 1) = x(k − 1|k − 1)χi(k − 1|k − 1) = x(k − 1|k − 1) + β

χi+n(k − 1|k − 1) = x(k − 1|k − 1)− β

β =√

(n+ λ)P (k − 1|k − 1)

(14)

where i = 1, . . . , n. The valueλ is a scaling parameter thatdetermines the spread of the sigma points around the stateand it is usually set to a small positive value and

√· is the

square root of the matrixP that can be computed by Choleskyfactorization [15].

The data setχ2n0 is propagated through the full nonlineari-

ties f andh:

χi(k|k − 1) = f(χi(k − 1|k − 1)) (15)

ϕ(k|k − 1) = h(χi(k|k − 1)) (16)

These results represent a probability density characterized bythe mean:

ˆx(k|k − 1) =2n∑i=0

Wiχi(k|k − 1)

ˆy(k|k − 1) =2n∑i=0

Wiϕi(k|k − 1)

(17)

and by the covariance

PY Y (k|k−1)=∑2n

i=0Wi(ϕ(k|k−1)−y(k|k−1))(ϕ(k|k−1)−y(k|k−1))T

(18)


24

PXY (k|k−1)=∑2n

i=0Wi(χ(k|k−1)−x(k|k−1))(ϕ(k|k−1)−y(k|k−1))T

(19)

P (k|k−1)=∑2n

i=0Wi(χ(k|k−1)−x(k|k−1))(ϕ(k|k−1)−x(k|k−1))T

(20)where the valuesWi are called ”weights” and can be computedin different ways. Cf. [1]. The Kalman gain can be obtainedas:

Kk = PXY P−1Y Y (21)

The prediction step of the Kalman filtering is defined by theequations (18) to (20). The correction step is defined by thefollowing formula:

P (k|k) = P (k|k − 1)−KkPY Y (k|k − 1)KTk (22)

In short, the UKF method can be described as follows: (14)-(15)-(16)-(17)-(18)-(19)-(20)-(21)-(8)-(22).

C. Kalman filter extension for parameter estimation

In this kind of method, we will use the parameters as ifthey were a dynamical systems in themselves. Thus, in theparticular case of steady state regime, they must behave asconstants: −→

λ k =−→λ k−1 (23)

Notice that the variableλk is not really a constant, because theequations of the Kalman filter will modify the parameter withevery update of the measurement. Thus, the overall state spacefor our application, has increased:12 variables correspondingto the original states and36 variables corresponding to theaerodynamical parameters.

IV. RESULTS

To determine the performance and reliability of our methodwe have carried out two particular tests. These particularexamples show the strength of the method when it is appliedto specific situations.

A. Test 1: Parameter estimation of AS500

This test is based on simulated signals, because the dy-namical system has been determined from aerodynamicalexperiments [6], [10].

Here we make one approximation to the correct parametersof the system by using the Kalman filter based methodproposed previously. We useN = 1500 samples and theEuler method for integrating ODE [15] with a step size∆T = 0.1. Notice that the integration step size may differfrom the sampling time. The variables−→η and

−→V are corrupted

with a small additive Gaussian noise. Initial conditions forthe parameters are chosen in a random way, it affects theconvergence of the overall method.

After processing the signal, we estimate the parametersunder the assumption that the dynamical model and thestatistical features of the noise are known in advance. Theresults are shown in Table II. Fig. 3 shows the time evolution

TABLE IIPARAMETER ESTIMATION FROM SIMULATED DATA

Parameter Real EKF UKFa11 1.5276 1.2633 1.9660a22 21.0933 20.8660 21.1896a33 20.4220 20.2018 21.4239a44 16.3905 15.9151 16.8548a55 382.1290 380.0030 380.1463a66 388.0972 379.9988 384.0155a15 0 0.1309 0.0001a24 0 −0.0958 0.1628a26 −69.6939 −59.9030 −69.8978a35 67.7045 70.0684 70.3361a46 0 0.0801 0.0101m13 1.2801 2.0621 1.5235m33 −49.7019 −48.0249 −48.5013xm11 25.6919 23.0748 24.5183xm22 23.6878 20.0129 21.0075xm13 −4.5582 −9.1165 −5.4699x2m11 −173.4906 −150.0044 −170.8227x2m22 −166.3538 −149.9994 −158.8524CT1 −0.5878 −1.8974 −0.6579CT2 0.4452 0.5487 0.5789CT3 −0.8574 −2.8752 −0.6877CL1 20.4587 50.8756 15.4789CL2 −10.5897 −15.9874 −11.5582CL3 −5.8537 −1.2234 −7.2243CN1 −0.0553 −0.0417 −0.0664CN2 0.1069 0.1071 0.1069CN3 0.205 1.0258 0.389ClC1 −0.0400 −0.0405 −0.0415ClC2 −0.1730 −0.1919 −0.1271ClC3 0.2000 0.2930 0.2173CmC1 0.0240 0.0205 0.0304CmC2 0.9370 0.7975 0.9982CmC3 0.0850 0.0170 0.0266CnC1 0.9370 1.0833 0.6207CnC2 −0.7412 −0.8574 −0.7589CnC3 0 0.0450 −0.0010

TABLE IIITABLE OF ERROR

State Mean UKF Variance UKF Mean EKF Variance EKFx 0.0439 0.0015 0.0639 0.0016y 0.0670 0.0025 0.0209 0.0027z 0.0217 0.0038 0.0086 0.0040φ 0.0626 0.0009 0.0651 0.0009θ 0.0179 0.0001 0.0209 0.0003ψ 0.0153 0.0171 0.0603 0.0018

of the parameterCT2 simulated by the proposed methods. Therest of the parameters behave in a similar way. To simplify thispresentation we only exhibit a few graphics of them. Figures4 and 5 compare the dynamics of the real model against theresults of the simulation. These results where estimated byusing the parameters shown in Table II. In order to check thereliability of our results, we use the relative error:

e =|x− x||x|

(24)

where x is the simulation that we obtain for the new param-eters. Table III shows the mean and variance of the error insome of the states.


25

Fig. 3EVOLUTION OF THE PARAMETERCT2

Fig. 4COMPARISON BETWEEN REAL MODEL ANDEKF ESTIMATION . T [S] VS V

[M /S] AND Z[M ] AND θ[RAD]

B. Test 2: Parameter estimation of KARMA

After we prove that the method works for simulated signals,we can apply it for real signals of the robot KARMA.

In this test we use real measures of velocities and positionreported by the GPS each second and measures of anglesreported by the electronic compass each 0.8 seconds. Thesetwo instruments are placed in the nacelle of the dirigible andthey are connected to a CPU that stores the data in a harddisk.

In order to estimate the parameters we must suppress theproblem of the non synchronized measure. We face this withspecial routines of data fusion. Cf. [4].

The flight for the identification was made in cruise condi-tions. It means that we keep a constant aerodynamical speedand a constant height.

The maneuvers for this flight were basically gyres that wereexecuted without slips and with a radius greater than 28.17meters (3 times the length of the dirigible). For visualizingbetter the maneuvers we show a graphic East vs. North in theFigure 6 and in the figure 7, we can see some part of the

Fig. 5COMPARISON BETWEEN REAL MODEL ANDUKF ESTIMATION . T [S] VS V

[M /S] AND Z[M ] AND θ[RAD]

movement of this test in a 3D vision.

−50 0 50 100 150 200 250−150

−100

−50

0

50

100

150

200

250

East[m]

Nor

th[m

]

Trajectory of Test 2Start pointEnd point

Fig. 6EAST[M ] VS NORTH[M ]

Summarizing, We use a real flight trajectory lasting922.8seconds, which gave us917 values of position and speed. Wealso take9235 values for the different angles involved in thedynamical system. Then, we synchronize the observations bytaking into account the measure instrument having the leastfrequency among all the frequencies of the measurements:this is the electronic compass. After the signal has beenpreprocessed, it is ready to estimate the parameters. We usehere the UKF based algorithm because it shows a goodperformance when was applied on simulated experiments.

Figure 8 shows the results obtained from several parameterstaken from real flight trajectories. We stress that the conver-gence towards a constant value proves the soundness of ourmethod and it is also noticed that this good performance takesplace even in the presence of noisy data. Table IV shows allthe estimated values for every dynamical parameter of theexperimental dirigible in which we are interested. It also showsthe covariance of the signal obtained in every case. We canuse the parameters as a new model for the real dynamicalsystem, then we can simulate its behavior with arbitrary input


26

Fig. 73D VIEW OF THE FLIGHT

Fig. 8PARAMETER CONVERGENCE OF A REAL MODEL

data. In particular, we take the same input signals that weuse in the real performance to compare the simulation againstthe real responses. Fig. 9 shows this comparison. Finally,we conclude that the UKF algorithm gives reliable parameterestimation from incomplete and noisy corrupted data takenfrom experimental data of flying robots.

V. CONCLUDING REMARKS

In this work we have shown two different techniquesfor parameter estimation based on Kalman filtering. Thesemethods were applied on real data to estimate the aerodynamicparameters of experimental dirigibles under non linear flightconditions.

The algorithms proposed here are EKF and UKF. Both ofthem have shown a good computational performance. The factthat the UKF algorithm presents a better performance becauseit can work under chaotic dynamical systems is remarkable.This algorithm involves a routine to approximate the solutionof nonlinear differential equations. As it strongly depends onthe numerical solution of the differential equations, we takeRunge-Kutta of order four as the most proper routine. TheEKF can be used with other numerical routines intended for

TABLE IVPARAMETER ESTIMATION FROM REAL DATA

Parameter UKF Covariance(Pii)a11 11.2428 0.1580a22 90.4922 0.0925a33 70.5841 0.0922a44 42.3387 0.0901a55 383.7979 0.0837a66 419.9314 0.0872a15 6.9103 0.1309a24 1.2382 0.1240a26 195.3407 0.1269a35 59.4323 0.1053a46 −28.5030 0.1053m13 −33.7772 0.0621m33 93.7707 0.0982xm11 76.4093 0.0905xm22 54.7163 0.0192xm13 75.3240 0.0962x2m11 201.9972 0.0335x2m22 224.8353 0.0896CT1 −2.9074 0.1290CT2 −13.9818 0.0949CT3 −0.7970 0.0767CL1 15.0799 0.0744CL2 −7.6177 0.0644CL3 −3.2706 0.0249CN1 −2.1196 0.0676CN2 −0.2250 0.0446CN3 0.6837 0.0508ClC1 −0.0725 0.1442ClC2 2.9208 0.1509ClC3 1.0168 0.0582CmC1 5.1576 0.0538CmC2 −1.8937 0.0814CmC3 −1.1017 0.0762CnC1 −0.1082 0.0942CnC2 −0.5101 0.0415CnC3 0.0115 0.0227

Fig. 9COMPARISON MODEL AND REAL DATA. T[S] VS Vx [M /S], Y [M ], φ[RAD]

solving ODE, but there is no advantage in doing so as theseroutines usually require the expensive calculus of the Jacobianor the Hessian matrices.

We stress that the choice of the initial approximation is


27

crucial for the convergence of the EKF algorithm. However,this is not the case for the UKF algorithm as it performs welleven under chaotic dynamical system.

In spite of the fact that UKF algorithm performance doesnot depends on the initial approximations of the parameters,it strongly depends on the initial election of the covariancematrices.

The parameter identification of robots used as dirigiblesis a very hard task because it must be addressed by usingexperimental aerodynamic measurements in wind tunnels. Themost remarkable achievement of this work is that we canestimate such kind of non linear dynamical parameters bydoing numerical simulations based on Kalman filtering.

One particular limitation of our approach is that we need toknow the specific model of the dynamical system in advance.We will focus on this particular limitation in future works.

ACKNOWLEDGMENTS

Universidad de los Andes. L.S. thanks to COLCIENCIASand its Programs: National Doctorates and Bomplan. D.P.thanks to Departamento de Matematicas, Dr. Rene Meziat forhis help and support and the research Grant 1204-05-13627-COLCIENCIAS-2004.

REFERENCES

[1] Sitz A., Schwarz U., et.,Estimation of parameters and unobserved com-ponents for nonlinear systems from noisy time series, Physical Review.E, Statistical, Vol. 66, No. 1, 2002.

[2] Sayed A. A Framework for State-Space Estimation with UncertainModels, IEEE Transaction on Automatic Control, Vol. 46, No. 7, July2001.

[3] Saab S.A Heuristic Kalman Filter for a Class of Nonlinear System, IEEETransaction on Automatic Control, Vol. 49, No. 12, December 2004.

[4] Leung, H.An Aperiodic Phenomenon of the Extended Kalman Filter inFiltering Noisy Chaotic Signals, IEEE Transaction on Signal Processing,Vol. 48, No. 6, June 2000.

[5] Julier S.J., Uhlmann J. K.A New Extension of the Kalman Filter toNonlinear Systems. In Proc. of AeroSense: The 11th Int. Symp. onAerospace/Defence Sensing, Simulation and Controls, 1997.

[6] S. Lacroix, I-K. Jung, et.The autonomous blimp project of LAAS/CNRS:Current status and research challenges. In 8th International Symposiumon Experimental Robotics. Sant’Angelo d’Ischia (Italy), 2002.

[7] Chui C.K. Chen G.Kalman filtering with Real-Time Applications,Springer, Germany, 1999.

[8] Grewal S.M., Andrews P.A.Kalman filtering, Theory and practice,Prentice Hall, USA, 1993.

[9] E. Hygounenc, I.-K. Jung, P. Soueres and S. Lacroix.The AutonomousBlimp Project of LAAS-CNRS: Achievements in Flight Control andTerrain Mapping, Special Issue on the 8th International Symposium onExperimental Robotics, 2002.

[10] Hygounenc E., Soueres P.,Dynamnic and aerodynamic modeling of asmall size airship, In 5th Workshope on Electronics, Control, Modeling,Measurment and Signals, Toulouse, France, June 2001.

[11] Wan E.A., Merwe R.The Unscented Kalman Filter for nonlinearestimation, Oregon graduate Institute of Science and Technology, 2000.

[12] Kalman, R.E.A new approach to linear filtering and prediction problem,Transactions of the Asme journal of basic enginering, vol.8, 1960.

[13] Schmidt, G.T.Practical Aspects of Kalman Filtering Implementation,AGARD-LS-82, NATO Advisory Group for Aerospace Research andDevelopment, England, 1976.

[14] Bryson, A.E. The aerodynamical forces on a slender low (or hight)wing-body-vertical-tail configuration, Journal of Aeronautical Sciences -Reader’s Forum, vol. 21, No. 8, pages 574-574, August 1954.

[15] Burden, L. R.Numerical Analisys, Edamsa Printers, 2002.[16] Haykin, S. Kalman Filtering and Neural Networks. AGARD-LS-

82,Wiley-Interscience, 2001.[17] Bertsekas, D.P.,Nonlinear programming,Athena Scientific, 1995.


28

End-point Control of a Flexible Structure Mounted ManipulatorBased on Wavelet Basis Function Networks

David X.P. ChengDefence Research and Development Canada - Suffield

PO Box 4000, Medicine Hat, Alberta T1A 8K6, [email protected]

Abstract

In this paper, we present a wavelet basis function (WBF)network based control scheme for end-point tracking con-trol of a system consisting of a rigid micro manipulatorattached at the end of a flexible macro manipulator. Theobjective is to suppress vibrations in the macro manipu-lator and at the same time achieve desired motions of theend-effector of the micro manipulator. A WBF network isutilized to approximate the dynamic behavior of the macro-micro manipulator (M3) system in real time, and the con-troller is developed without any need for prior knowledgeof the dynamics. A weight-tuning algorithm for the WBFnetwork is derived using Lyapunov theory. It is shown thatboth the path tracking error and the damped vibrationsare uniformly ultimately bounded under this new controlscheme. A case of end-point tracking via kinematic redun-dancy is studied and the results are compared to those ob-tained using an MLP network controller and a PD jointcontroller.

1. Introduction

Long-reach manipulators have been proposed for arange of applications that include Space Station mainte-nance and operation, and nuclear waste disposal. In suchapplications, the lightweight structure of long-reach manip-ulators allows the actuators to move faster and carry heavierloads than conventional rigid manipulators. However, thesignificant structural flexibility makes it difficult to controlthe position and force at the end-effector accurately andreliably. The incorporation of a small, rigid micro manip-ulator at the tip of a large, flexible macro manipulator hasbeen proposed as a solution to achieve the desired accurateand robust performance. In order to utilize the macro-micromanipulator (M3) system effectively, one must address theproblem of controlling and compensating for vibrations re-sulting from the flexibility in the macro-manipulator links.

Considerable research has been done to address differ-

ent aspects of the M3 system. Nenchev et al. [1] utilized thereaction null space concept to decompose the joint space ofthe flexible base and the micro manipulator into orthogonalsubspaces such that no vibrations of the flexible base areinduced when the end-effector moves along a reactionlessreference path. Yoshikawa et al. [2] used the redundancyof the M3 system to generate joint trajectories that enablethe micro manipulator to maximize its ability to compen-sate for the tracking error resulting from the deformationof the macro manipulator. The dynamics of the system arenot considered in this control scheme and therefore it is notsuitable for a fast tracking control task. Yoshikawa et al.[3] modified their previous work by taking into account thesystem dynamics. However, the problem of closed-loopstability and vibration damping in the macro-manipulatorwere not addressed in either of these papers. Cheng andPatel [4, 5] proposed a control scheme to suppress vibra-tions in the macro manipulator while achieving stable de-sired motions of the end-effector of the micro manipulator.A multi-layer perceptron (MLP) network is utilized to es-timate the nonlinear dynamic behavior for the M3 system,and the resulting estimates are used to develop controllersfor the macro and micro manipulators without any need forprior knowledge of the dynamic model of the M3 system.Under the MLP based control scheme, both the trackingerror of the end-effector of the micro manipulator and thevibration in the macro manipulator are rapidly suppressedand constrained within an arbitrarily small vicinity of theorigin, while the magnitudes of the joint torques are keptbounded. However, the number of weights required to beadjusted in the MLP network may rapidly increase as thesize of the network increases.

To reduce the computational complexity while preserv-ing the advantages of the MLP network based controlscheme described above, we propose in this paper to usewavelet basis function (WBF) networks for learning thedynamics of the M3 system and use the end-point feed-back from visual tracking for formulating the joint torquecommand. The closed-loop stability of the control schemeis guaranteed while fast, precise end-point tracking is


29

achieved. The performance of the control scheme is studiedon a redundant M3 system working in a 3-D task space.

2. Control Schemes Based on Wavelet BasisFunction Networks

2.1. Tracking Error Dynamics

Micro Manipulator

� �Macro Manipulator

Figure 1. Schematic of a macro-micro manipula-tor

Consider a system consisting of a macro manipulatorwith M flexible links and a micro manipulator with m rigidlinks (Figure 1) with the equations of motion given by

M(x)x+C(x, x)x+G(x)+ τd = τ (1)

where M is the inertia matrix, C is the matrix of the Coriolisand centrifugal terms, G is the vector of gravity and elastictorques, τd represents unmodelled dynamics. The positionof the end-effector p is given by

p = p(θM ,θm,δ ) (2)

where θM and θm are the joint positions of the macro andmicro arms, respectively, and δ is the flexural displacementvector of the macro arm. A small variation in the tip posi-tion p can be expressed as

p = JθM θM + Jθm θm + Jδ δ (3)

where JθM ∈Rn×M , Jθm ∈R

n×m and Jδ ∈Rn×e are Jacobian

matrices of p with respect to θM , θm and δ respectively,and θM , θm and δ denote small changes in θM , θm and δrespectively.

Denote the desired trajectory as

xd = [θ TMd ,θ T

md ,δ Td ]T (4)

and define the tracking error x and the filtered tracking errorr =

[r1

T ,r2T ,r3

T]T by

x , x− xd =[θ T

M , θ Tm , δ T

]T(5)

r1 , ˙θM +Λ1θM +κG ˙δ +Λ1κGδ (6)

r2 , ˙p+Λ2 p (7)

r3 ,˙δ +Λ3δ (8)

where θM = θM −θMd , δ = δ −δd, θm = θm −θmd , Gand Λi, i = 1,2,3, are symmetric positive-definite matrices,κ > 0 is a design parameter. The vector r can then be ex-pressed in a matrix form

r = Γ ˙x+ Γx+ΛΓx (9)

where Λ = diag{Λ1,Λ2,Λ3} and

Γ =

IM 0 κGJθM Jθm Jδ0 0 Ie

(10)

with J(·) representing the time derivatives of the Jacobians,and IM and Ie representing the M ×M and e × e identitymatrices, respectively. Then the dynamics of the M3 systemcan be expressed in terms of the filtered tracking error as

Mϒr +(Cϒ+Mϒ)r = f − τd + τ (11)

where

ϒ , Γ−1 =

IM 0 −κG−J−1

θmJθM −J−1

θmJ−1

θm(Jδ −κJθM G)

0 0 Ie

(12)

and f is an unknown nonlinear function of parameters suchas the inertia, centrifugal and Coriolis terms, DH parame-ters, etc., of the M3 system, and is given by

f (u) = −G−Mxd −Cxd +Mϒ(2Γ ˙x+ Γx+

Λ(Γ ˙x+ Γx))+Cϒ(Γx+ΛΓx)+Mϒr(13)

where u can be chosen as

u ,[xT , ˙xT ,xT

d , xTd , xT

d]T (14)

Expressing the error dynamics as (11) will enable us to de-rive a WBF network based control scheme in the next sec-tion.

2.2. Controller Design

Let u ∈ Rµ and f ∈L 2(Rµ ) on a compact set S ⊂R

µ .By the multi-resolution analysis theory [6], for a given pos-itive number εN > 0, there exist a collection of wavelet ba-sis functions {ϕ j0,k : k ∈ K} with a sufficiently small scalej0 such that

fi(u) = ∑k∈K

ci, j0,kϕ j0,k(u)+ ε(u), i = 1,2, · · · ,n (15)

and‖ε(u)‖ < εN , ∀ u ∈ S (16)

where ε(u) is the reconstruction error and

ϕ j0,k(u) = 2− j0/2ϕ(2− j0u− k) (17)ci, j0,k = 〈 fi,ϕ j0,k〉 (18)


30

Let W = {ci, j0,k} and ϕ =[ϕ j0,k1 ,ϕ j0,k2 , · · · ,ϕ j0,kN ]T , wherek1, k2, · · · ,kN are the translation vectors in the index set K.We can then express (15) in a matrix form:

f (u) = W T ϕ(u)+ ε (19)

We assume that on any compact subset of Rn, the weighting

matrix is bounded, i.e.,

‖W‖F ≤ bW (20)

where ‖ · ‖F denotes the Frobenius norm. The approxima-tion given by (19) can be implemented by a wavelet basisfunction (WBF) network. The weights in the input layer ofthe WBF network are identically set to 1 while the weightsin the output layer represent the weighting coefficients ofthe basis functions ϕ j0,k in the wavelet expansion of f (u)given by (15).

Let an estimate of the nonlinear function f (u) defined in(13) using a WBF network be given by

f (u) = W T ϕ(u) (21)

where W is an estimate of the weights of the WBF networkdetermined by a tuning algorithm to be specified. Let fbe partitioned into f = [ f T

1 , f T2 , f T

3 ]T with f1 ∈ RM , f2 ∈

Rm and f3 ∈ R

e. As in the case of the MLP network [4],the structure of the error dynamics given by (11) and thestability analysis motivate us to design the WBF networkbased controllers for the macro and micro arms of the M3

system as follows:

τM = − f1 +ξ1−r1rT

3‖r1‖2 + ε

· ( f3 −ξ3 +K3r3)−K1r1

(22)

τm = − f2 +ξ2−K2r2 (23)

ξi = −kξri

‖r‖ , i = 1,2,3 (24)

r = [rT1 , rT

2 , rT3 ]T = ϒr, r1 ∈ R

M , r2 ∈ Rm, r3 ∈ R

e

(25)

where Ki, i = 1,2,3, are positive definite matrices, ε > 0and kξ > 0 are positive design constants, ϒ is a nonlineartransformation given by (12), and ξi, i = 1,2,3, are robus-tifying components incorporated to compensate for the re-construction errors of the neural network and the unmod-elled dynamics.

2.3. Weight Tuning Algorithm

Based on the control scheme given by (22) through (25),a weight tuning algorithm for the WBF network can be de-signed as a natural extension of the weight tuning algorithmfor the MLP network developed in [4] for achieving stable

tracking control of the closed-loop flexible M3 system. Thealgorithm is determined by the following equation:

˙W = P(ϕ · rT −µ‖r‖W

)(26)

where P = PT is any constant symmetric positive-definitematrix, µ > 0 is a small scalar design parameter, and ϒ isa nonlinear transformation given by (12). A block diagramof the M3 control system is shown in Figure 2.

� � � ��

�� !�" #�$&%�'!�( #�$&%!�"&) ( * +�, " - % $. / .&- 0 !

132 465798:6;<==4>57?87?87?8:>;<< < @@@===

4A57B8:A;C <C== DEF GFHF

ED GDHDHD

EIJ HIJGIJ KLML

Figure 2. Block diagram of the M3 control system

2.4. Algorithm Analysis

The algorithm presented above is derived by an exten-sion of Lyapunov theory. It can be shown that under theWBF network based control scheme and the weight tun-ing algorithm proposed above, the tracking errors for theM3 system can be uniformly driven to an arbitrarily smallneighborhood around zero. Choose the Lyapunov functioncandidate

V (r,W , t) =12

rT ϒT Mϒr +12

tr{WT P−1W} (27)

where tr{·} denotes the trace of a square matrix. Differen-tiating V along the solution of the error dynamics systemyields

V =12

rT ϒT (M −2C)ϒr + rT ϒT ( f + τ − τd)+ tr{WT P−1 ˙W}(28)

Using r = ϒr and the skew-symmetric property of M−2C,we obtain

V = rT1 ( f1 + τM)+ rT

2 ( f2 + τm)+ rT3 f3 − rT τd

+ tr{WT P−1 ˙W}(29)


31

Substituting the control laws (22) and (23) into the aboveequation, we obtain

V = rT1

(f1 − f1 +ξ1−

r1rT3

‖r1‖2 + ε( f3 −ξ3 +K3r3)−K1r1

)

+ rT3 f3 + rT

2 ( f2 − f2 +ξ2 −K2r2)− rT τd + tr{WT P−1 ˙W}≤ −rT Kr + rT f + rT ξ − rT τd + tr{WT R−1 ˙W}

(30)

where K = diag{K1,K2,K3}, f = f − f , and ξ =[ξ T

1 ,ξ T2 ,ξ T

3]T . From (19) and (21), the second term on

the right-hand side of (30) can be expressed as

rT f = rT (W T ϕ −WT ϕ + ε

)

= rTW T ϕ + rT ε(31)

From (24), the third term on the right-hand side of (30) canbe expressed as

rT ξ = −rT · kξr

‖r‖= −kξ‖r‖

(32)

From (26), the last term on the right-hand side of (30) canbe expressed as

tr{W T P−1 ˙W} = −tr{WT P−1 ˙W}= −tr

{W T P−1 P

(ϕ · rT −µ ‖r‖W

)}

= −rTW T ϕ + µ ‖r‖ tr{WTW}(33)

Substituting (31), (32) and (33) into (30) we obtain

V ≤−rT Kr + rTW T ϕ + rT ε − kξ‖r‖− rT τd

− rTW T ϕ + µ ‖r‖ tr{WTW}≤ −rT Kr− kξ‖r‖+ rT (ε − τd)+ µ ‖r‖ tr{WTW}≤ −rT Kr− kξ‖r‖+ rT (ε − τd)+ µ ‖r‖ (bW‖W‖F −‖W‖2

F)

≤−‖r‖{

λmin(K)‖r‖+ µ (‖W‖2F −bW‖W‖F)+ εN

}

(34)

where we have applied the known condition kξ > bd andthe inequality tr{WTW} ≤ bW‖W‖F −‖W‖2

F [4] to aug-ment the right-hand side of (34). Completing the squaresresults in

V ≤−‖r‖{

λmin(K)‖r‖+ µ(‖W‖F −bW

2)2 − µ b2

W4

+ εN

}

(35)Therefore, V < 0 if

‖r‖ >1

λmin(K)

(µ b2W

4+ εN

), br (36)

or

‖W‖F >bW

2+

√b2

W4

+εN

µ, bW (37)

Here br and bW denote the regions of convergence for thefiltered tracking error and the weight estimation error inthe controller, respectively. V becomes negative and V de-creases outside the compact set defined by ‖r‖ ≤ br and‖W‖F ≤ bW . According to the LaSalle extension of Lya-punov analysis, this concludes that both r and W are uni-formly ultimately bounded (UUB).

3. End-point Tracking by Kinematic Redun-dancy

To demonstrate the effectiveness of the proposed controlscheme, we have applied it to a M3 system with one flexibleboom and four rigid micro-arms. The flexible boom has athickness of 1.3 mm, width of 3.14 cm, length of 1 m, massof 1 kg and flexural rigidity of 20 Nm2. The first and secondrigid links have equal lengths of 0.1 m and equal masses of0.1 kg, and the third and fourth have equal lengths of 0.2 mand equal masses of 0.2 kg. The assumed-modes methodwith clamped-mass boundary conditions was used to modelthe M3 system. Two orthonormal mode shapes were takeninto account for simplifying the inertia and stiffness matri-ces. The resulting natural frequencies of vibration are 2.50Hz and 15.6 Hz. The redundancy of the micro manipulatorwas fully used to suppress vibrations in the flexible boom.All the joints of the M3 system were controlled to track theend-point trajectory given by

xd = 0.3cos(2t)+0.80

yd =

√3

25πt +0.5

√3

zd = 0.1sin(2t)

(38)

which is a helix with a radius of 0.2 m. The desired jointvelocity θd is given by

θd = J+ pd +(I− J+J)η , (39)

where J+ is the pseudo-inverse of [JθM ,Jθm ] and η is anarbitrary 4× 1 vector. The desired joint trajectory θd isobtained by integrating the above equation.

A WBF network was designed to approximate the un-known nonlinear function f . The inputs to the networkwere taken as u =

[xT , ˙xT ,xT

d , xTd , xT

d

]T Twenty-five hiddenunits were used in the network with the activation waveletsselected to be the radial Mexican hat functions given by

ϕ(u) = (µ −uT u)e−uT u/2, u ∈ Rµ (40)


32

The WBF network has a total of 150 weights which requireno prior knowledge of the parameters either of the M3 sys-tem dynamics, and were simply initialized at zero and up-dated on-line by the weight tuning algorithm given by (26).

Figure 3. Arm motions of the M3 system with kine-matic redundancy (t=3.25-6.25 sec)

Figure 3 shows the sequences of arm motions controlledby the WBF network for the end-effector of the M3 sys-tem to track the desired helix in 3-D task space. Figures4 to 6 compare the results of the tracking control of theM3 system using the WBF network controller against thoseusing an MLP controller [4]. We can see from Figures 4and 5 that the amount of vibration in the micro arm waseffectively constrained in small regions around the originby the WBF as well as the MLP controller, with the WBFcontroller being slightly more efficient than the MLP con-troller in suppressing the vibrations. Figure 6 shows the2-norms of the task space tracking errors. We can observethat stable and robust tracking were achieved by the MLPand WBF controllers. In this test case, the overall perfor-mance of the WBF controller is comparable to that of theMLP controller in the sense of tracking accuracy. Furthercomparison of the WBF and MLP controllers is providedin Table 1, where we compare the WBF controller with theMLP controller in terms of computational complexity. Itcan be seen from Table 1 that the total number (150) ofweights required to be tuned in the WBF network is muchless than that (376) required by the MLP network, whichmakes the WBF controller faster in learning the dynamicsof the M3 system. Figure 7 compares the tracking errorsin task space using the WBF network based controller anda PD controller. The PD controller was implemented by

Table 1. Comparison of the WBF and MLP con-trollers (nh=number of hidden units, nW =number ofweights, nI=number of integrators, ‖e‖=average 2-norm of tracking error)

Item WBF network MLP network

nh 25 10

nW 150 376

nI 150 376

‖e‖ 8.532×10−3 7.296×10−3

shutting down the WBF network in the outer loop of thesystem and using the state position and velocity feedbackin the inner loop. It is shown that the tracking error is sig-nificantly reduced under the proposed WBF network basedcontrol scheme. This is because the WBF network con-troller is capable of suppressing the vibrations in the flexi-ble boom induced by acceleration of the micro manipulatorduring the end-point tracking.

Figure 4. Flexural deflections (1st mode) of themacro manipulator (MLP: —, WBF: - -)

4. Conclusions

In this paper, we present a novel control scheme fortime-varying end-point tracking control of a macro-micromanipulator (M3) system based on wavelet basis function


33

Figure 5. Flexural deflections (2nd mode) of themacro manipulator (MLP network: —, WBF: - -)

0 1 2 3 4 5 6 70

0.005

0.01

0.0152−norm of Cartesian tracking errors

time (sec)

norm

(m)

Figure 6. Cartesian tracking errors and errornorms (MLP: —, WBF: - -)

(WBF) networks. The control scheme allows us to con-strain the tracking errors of the micro manipulator in thepresence of vibrations due to the flexibility of the macromanipulator links within an arbitrarily small region aroundthe origin by applying bounded control torques at the jointsof the M3 system. The WBF network is designed to per-form the learning and control tasks online simultaneouslyand no off-line training procedure is required for the neu-ral network to identify the dynamic model of the M3 sys-tem. The stability and convergence properties of the con-trol scheme provide assurances of the reliability neededto make the controller feasible in practical real-time con-trol. The performance of the control scheme is tested and

Figure 7. Cartesian tracking errors and errornorms (WBF network: —, PD: - -)

compared to that of an MLP network controller and a PDjoint controller on a four-link rigid micro manipulator at-tached at the tip of a flexible structure. It is shown that theWBF network controller significantly improves the track-ing performance over the PD controller, and it is capableof achieving comparable performance to that of the MLPcontroller while significantly reducing the amount of on-line computation for updating the weights of the network.

References

[1] D.N. Nenchev, K. Yoshida, P. Vichitkulsawat, andM. Uchiyama, “Reaction null-space control of flexiblestructure mounted manipulator systems,” IEEE Transationson Robotics and Automation, vol. 15, pp. 1011–1023, 1999.

[2] T. Yoshikawa, K. Hosoda, T. Doi, and H. Murakami, “Quasi-static trajectory tracking control of flexible manipulator bya macro-micro manipulator system,” in Proceedings of theIEEE International Conference on Robotics and Automation,1993, pp. 210–215.

[3] T. Yoshikawa, K. Hosoda, T. Doi, and H. Murakami, “Dy-namic trajectory tracking control of flexible manipulator by amacro-micro manipulator system,” in Proc. IEEE Int. Conf.Robotics and Automation, San Diego, CA, June 1994, pp.1804–1809.

[4] D.X.P. Cheng and R.V. Patel, “Neural network based track-ing control of a flexible macro-micro manipulator system,”Neural Networks, vol. 16, pp. 271–286, 2003.

[5] D.X.P. Cheng and R.V. Patel, “Vibration control of a flexiblemacro-micro manipulator system using neural networks,” inProc. 15th IFAC World Congress, Barcelona, Spain, 2002.

[6] S.G. Mallat, A Wavelet Tour of Signal Processing, AcademicPress, New York, NY, 2nd edition, 1999.


34

ARTag Fiducial Marker System Applied to VisionBased Spacecraft Docking

Mark FialaNational Research Council of Canada

Bldg M-50 1200 Montreal RD, Ottawa, Canada K1A-0R6Email: [email protected]

Abstract— Computer vision systems can be used for spacecraftdocking where one or more cameras on one craft detect visualfeatures on the other. Such a system can be used in the final tensor hundreds of metres of a docking procedure. Relative posecan be determined accurately and inexpensively using machinevision. Fiducial marker patterns can be used to robustify thevision processing to provide a high reliability and confidence inthe machine vision’s result. The ARTag fiducial marker systemencodes digital codes in 2-D marker patterns, the ARTag systemwas designed to maximize and quantify reliability of detection.ARTag markers are used in a multi-scale array to facilitate botha large distance range and enhanced reliability. The relative posebetween the two spacecraft is determined automatically. A real-time prototype is shown.

I. INTRODUCTION

The final stages of docking two spacecraft can be accom-plished using Fiducial Markers mounted on one craft and acamera (or cameras) mounted on the other. Automatic pro-cessing of this image (known as computer vision or machinevision) can provide the relative pose (position and orientation)between the two craft. Examples include the space shuttledocking with the international space station, or a roboticservice craft docking with a satellite to repair or alter its orbit.The fiducial markers would be mounted on the space stationor satellite at different locations and sizes so that several arealways in view as the docking craft approaches.

Fig. 1. ARTag fiducial markers proposed mounting on International SpaceStation (ISS) to facilitate automatic docking. (Left) close-up of ISS dockingmodule. (Right) Large markers placed elsewhere on unused surfaces fordetection at greater distances.

Computer vision systems are a natural choice for dockingsytems since video cameras can be low cost and provide highangular accuracy. These cameras are also present on manned

spacecraft and their imagery can be used for pose determina-tion. Computer vision relies on being able to automaticallyrecognize and track features visible in the camera image.Computer vision has the advantage of being able to trackposition of features with a high degree of accuracy comparedto human vision, however automated methods often fall shortin the task of recognition. Features such as a circular white doton a black background can look similar to many other roundfeatures that are not the intended feature. System integrators,with good reason, have been slow to adopt computer visionin many critical systems due to issues of robustness.

Fiducial Marker Systems are created to address this issue fora whole host of robotics, augmented reality, photo-modelingand other applications where it is important to make it possibleto find correspondences in an image with points in 3D space.ARTag is a recently designed fiducial marker system designedto provide a high degree of reliability and functionality inuncontrolled lighting situations. Like other fiducial markersystems, ARTag consists of a library of 2D patterns whichare detected in digitized camera images.

ARTag markers were specially designed for a vanishinglylow false detection rate (when a marker is falsely reportedwhen no real marker is present) and a very low inter-markerconfusion rate (where one marker is mistaken for another).These markers, very robust on their own, are combined intoarrays to provide yet higher confidence and to span a largerspace providing a more accurate pose determination. Sincethere are up to 2002 possible ARTag markers, they do notneed to be repeated, and many can be placed around aspacecraft with varying sizes to permit detection at far andclose distances.

II. FIDUCIAL MARKER SYSTEMS

Fiducial marker systems consist of the printed patterns andthe computer vision algorithms to identify them in an image.Some numerical metrics for appraising marker systems are; 1)the false positive rate, 2) the inter-marker confusion rate, 3) thefalse negative rate, 4) the minimal marker size, 5) the vertexjitter characteristics, 6) the marker library size, and 7) thespeed performance. The false positive rate is the rate of falselyreporting the presence of a marker when none is present. Theinter-marker confusion rate is the rate of when a marker isdetected, but the wrong id was given, i.e. one marker wasmistaken for another. The false negative rate is the probability


35

that a marker is present in an image but not reported. Anothermetric is 4) the minimal marker size which is the size inpixels required for reliable detection. The smaller the markerneeds to be in the image, the larger the usable range from thecamera the marker system can be used at. The library size isthe number of unique markers that the system can support.Speed performance is a consideration, a vision based fiducialmarker tracking system must be real time with availablecomputing power. Two other factors of fiducial marker systemperformance, not quantitatively measured herein but describedloosely, are the immunity to lighting conditions and occlusion.Occlusion is when a fiducial marker is partially or completelyblocked from view by the camera by some other object, amarker system should be able to still detect a marker despitepartial occlusion. Space provides extreme lighting variationand a marker system destined for space docking should beable to detect markers with widely varying lighting.

There are several fiducial marker systems available, severalare shown below in Fig. 2, of which ARTag is the systemused in this paper. Most use a two step process of firstlocating an identifying feature, followed by a verification andidentification step.

Most use a quadrilateral boundary as an identifying feature,with the interior containing the identification and verificationinformation. The four corners can be used to create a homog-raphy to sample the interior in a perspectively correct manner.ARToolkit uses correlation methods to attempt to identify theinterior (making it sensitive to camera properties and lighting)while most others attempt a symbolic encoding method withthe interior divided up into several cells which are colouredwhite or black. The rotation of the quadrilateral markers mustbe found in order to correctly interpret the interior, Canon’smarker [5] and Binary Square Marker [3] use the four cornercells in their 3x3 or 4x4 cells respectively leaving 5 or 12digital bits left to encode identity. Cybercode [12] uses a blackbar to locate one side, and can be extended to an arbitrary sizebut does not provide information that allows it to be detectedunder perspective distortion. The ARVIKA markers (SCR,IGD, HOM) 1 use a 4x4 array, information is not availablefor these proprietary markers but it is clear that they can storeeither 16 or 20 bits. The ARTag marker system [6], [8] alsohas a quadrilateral border but has 36 internal cells, how theseare used to achieve high reliability is mentioned in the nextsection. ARToolkit Plus 2 is a new system, inspired by the highreliability of ARTag, but not containing all ARTag’s featuressuch as edge-based detection and Hamming distance analysis.

”Blob” analysis can be used instead of quadrilateral outlines,connectivity is performed on a binarized image to identifyadjacent and enclosed regions. Several commercially availablecircular fiducial marker systems exist using a single brokenannular ring around a circular dot such as Photomodeler’s”Coded Marker Module” 3. Naimark and Foxlin [10] describe

1http://www.arvika.de/www/index.htm2http://studierstube.org/handheld ar/artoolkitplus.php3http://www.photomodeler.com

a system extending the number of rings carrying 15 bitsof digital information. Bencina’s ”reacTIVision” marker [2]creates a tree of topology of what blobs are contained insideother blobs, trees matching a set library are used to findmarkers.

Fig. 2. Fiducial marker systems.

Most of these marker systems are not available for evalu-ation, due to non-disclosure or not being a fully developedsystem (many are only created for an application in anacademic paper or two and discarded). There are severalcomparisons for the available and developed systems. Zhang[13] and Claus [4] perform a survey of several fiducial markersystems (ARVIKA markers and ARToolkit) with respect toprocessing time, identification, image position accuracy withrespect to viewing angle and distance. ARToolkit is popularbecause it is simple, relatively robust, and freely available andis used in many augmented reality applications. ARTag andARToolkit are compared in [9].

How the unique features are found influences the reliabilityof the system with respect to the false negative, ideally aslow as possible meaning that the detection stage does notoften miss a marker when it is in view. Most of the abovesystems use binary morphology to find blobs and quadrilateralcontours, this relies on a greyscale threshold operation todivide white and black pixels. This thresholding can use asingle greyscale value, however a single value will often notbe able to separate all white from all black pixels in an image,often the white area in one area of the image is darker than theblack in another or vice versa. Automatic thresholding, eitheron the neighborhood around a pixel, or a histogram of theentire image is often applied. However, automatic thresholdingonly works in some cases, and the aperture problem of howlarge a neighborhood window to use causes problems andartifacts. An edge-based method to find polygon contourssolves this problem in most cases and provides a substantialimprovement in lighting performance and allows the use ofsimple heuristics to close borders broken by occlusion ormarkers extending beyond the image boundary. It appearsARTag is the only system to date to use edge based methodsfor the unique feature location step.

There are other visual markers used in industry to conveyinformation, some of them are included in this discussion forcompleteness, however they are not suitable for fiducial mark-ers as will be explained. The purpose is to carry information,


36

not to identify in a camera under perspective distortion as isneeded for fiducial marker systems. Five systems are shownbelow in Fig. 3.

Fig. 3. 1-D and 2-D pattern systems using in industry.

ARTag was inspired by ARToolkit and Datamatrix, and wasdesigned attempting to capture the best qualities of both forthe purpose of creating a small but highly reliable fiducial.Datamatrix, Maxicode and QR all have a common thread ofencoding data using binary values for reflectance, the patternis typically bitonal reducing the decision made per pixel toa threshold decision. This reduces the lighting and camerasensitivity requirement and removes need for photometriccalibration and linearization of the signal (i.e. no attempts aremade to identify shades of grey). ARTag was chosen by criteriadescribed in section II-B.

A. ARTag

ARTag is a planar fiducial marker system consisting ofeither 1001 or 2002 possible markers (depending if both black-on-white and white-on-black border polarities are used), eachmarker has a square border and an interior containing 36 cellswhich encode a 10-bit marker ID with error detection andcorrection. The markers (Fig. 4) are bitonal, thus not requiringphotometric camera calibration, and the square border uniquefeature is found by an edge-based method to handle lightingvariation and occlusion. In the ARTag module itself, eachmarker is independently found, with no history of previousimage frames. Due to the ID being a result of an algorithm,no pre-stored patterns such as ARToolkit’s pattern files arerequired.

Fig. 4. 12 examples of ARTag markers out of the library of 2002 availablemarkers (or only 1001 if only one polarity of background colour is used).ARTag markers are 10 units wide, with the center 6x6 cells carrying 10 bitsof ID.

The whole marker is 10 x 10 units, with a border ofthickness 2 units leaving 36 cells in the interior to carryinformation. Each cell is only black or white and carries onebit of digital data. Thus a 36-bit word can be extracted from acamera image of the marker once the boundary is determined.

Digital processing is performed on the 36 bits to recognize orreject the quadrilateral as an ARTag marker, and to report itsID if recognized.

Space applications can provide uncontrolled lighting withlarge variations of illumination of markers between and withinan image frame. ARTag’s quadrilateral detection using edgesallow it to still function as long as the camera’s shutterand dynamic range provide digital data where there is somedifference between white and black (at least 5 grey levels outof 256). Edge pixels are thresholded and linked into segments,which are in turn grouped into ”quads”. The four corners ofeach quad boundary are used to create a homography mappingto sample the marker’s interior.

Once quadrilateral border contours have been located, theinternal region is sampled with a 6 x 6 grid and assigned digitalsymbols ’0’ or ’1’. The 36-bit binary sequence encoded in themarker encapsulates a 10-bit ID using digital methods. Theextra 26 bits provide redundancy to reduce the chances offalse detection and identification, and to provide uniquenessover the four possible rotations. The system was designed asif it was a communication system, with the marker creationas the transmission, with the lighting, occlusion, perspectivedistortion, noise constituting the communications medium, andthe real-time marker decoding as the receiver. The CyclicalRedundancy Check (CRC) and forward error correction (FEC)are digital methods used to identify if the 36-bit code is partof the ARTag marker set, and to extract its ID. The 36 databits from the marker cells were provisioned into 10 for thepayload ID, 16 for a CRC-16 checksum, and the remaining10 bits for the FEC which can correct 2 bit errors. The 10payload bits do not reside in specific bits, but are convolvedand the ID and redundancy spread among all bits. The set of1024 possible marker interiors is reduced to 1001 to improvethe Hamming distance between markers.

ARTag marker corners are located with sub-pixel precision,with a typical jitter of about 0.05 pixels, and a minimummarker size of 15-17 pixels with a focused camera.

ARTag markers are detected individually and combined inthe system proposed herein, the docking pose for an imageframe is calculated by considering all the markers detected ina frame as described in Section III.

B. Comparing Marker Systems

A space vision docking system must have a very highreliability, it should function in adverse lighting conditionsand should have a vanishingly low rate (ideally zero) rate ofproviding a false inter-craft pose estimate. The false-positiveand inter-marker confusion rates describe how outlier detec-tions can provide a dangerous false positive pose estimate, thefalse-negative rate affects how often the system will simply notbe able to report a pose which could potentially be as deadlyif too much time goes by with no pose estimate provided.

The false-positive detection rate is critical, a marker systemshould have a very low and calculable rate of reporting afiducial marker when none is present. With the ARToolkitcorrelation based approach, these false detections can be quite


37

common. With the digital code based systems the number ofdata bits provisioned for error checking determine the possibleimmunity to false detection. For every quadrilateral found inthe image, assuming a high entropy interior pattern the falsepositive rate can be calculated as �

��where � is the number

of checksum bits. For example, with the 4x4 systems (SCR,HOM, IGD), if � � � bits are allocated for a checksum thesystem will have an estimated false positive rate of ��.Matrix [11] (Fig. 2) has 25 bits with 9 bits or less forchecksum giving a similar result. ARTag’s � � �� equalsa false positive rate of ��, if ARTag had no FEC itwould have a false positive of � � �� for ��.ARToolkit Plus [1] has no checksum but checks for the 9-bit payload ID to be repeated 4 times. A low false positiverate of �

��

�� . Whatever marker system is used,

the combination of several markers into a composite patternreduces the final false positive result, the odds of false positivescombining to match the geometric arrangement of a compositepattern reduces the false positive to below that of a singlemarker.

The false negative rate is also important, the docking systemshould provide a pose estimate for every image frame. Errorcorrection schemes allow the marker detection process to stillrecognize a marker, despite ARToolkit Plus’s very low falsepositive rate, it has no error correction and will not detect amarker as often as ARTag despite having the same capacityof 36 bits. The Binary Square Marker system [3] is the onlymarker system which is known to use error correction otherthan ARTag. Binary Square Marker can correct 2 bit errorsusing parity error correction, however using this brings thefalse positive detection rate to almost ��.

The library size is also an issue, the marker system shouldsupport a large number of markers so that they can be placedall over the spacecraft(s) without needing to repeat markers.ARToolkit Plus provides a library size of 512 markers, BinarySquare Marker provides 2048, and ARTag provides up to 2002.

The false positive, false negative rates, and library size areat odds with one another, and represent a trade-off betweenmissing a marker, seeing a non-existent one, and how manydifferent markers are available. These three parameters dependon how many bits are provisioned for error checking, errorcorrection, and number of ID bits respectively. Because of therotational symmetry of the quadrilateral markers described,several systems not using checksums allocate 4 bits (usuallythe corner cells) to determine orientation (Binary SquareMarker and the Canon system are known to do this). However,this introduces a special vulnerability to these 4 cells, if evenone is corrupted the orientation is unknown.

Clearly the larger the number of data bits/cells, the greaterthe possibility for implementing error detection and correction,with only ARTag performing both these functions. This wouldsuggest making the marker larger, however this increases theminimum size in pixels that the marker needs. Keeping themarker small is important to retain the highest possible rangeof distances where detection can take place.

A factor of great importance not yet mentioned is the

inter-marker confusion rate, confusing one marker for anothercreates outliers in image to 3D-point correspondences whichare likely as problematic as false positives. The ARToolkitPlus system simply uses the ID repeated four times (withan XOR mask), which results in markers with sequentialID’s having a Hamming distance of only 4 between them.Hamming distances are a measure of how different digitalcodes are, they are simply the sum of different bits, andcan be used to calculate one component of the inter-markerconfusion rate (described in [6], [8]). Many pairs of ARToolkitPlus’s markers have a Hamming distance of only 4 betweenthem, resulting in a not unlikely probability of markers beingmistaken one for another. The ARTag system addressed thisconcern directly in the marker system design and has greaterHamming distances between all markers in the library, takingrotation and reflections into account.

Considering digital coding factors, ARTag was selected forits low false positive and inter-marker confusion rates, as wellas its ability to reduce the false negative rate due to bit errorsand its large library size. Also, its edge based method providesa much needed immunity from lighting variation. The aboveanalysis has focused on single marker detection, the individualmarkers are now combined into arrays.

III. PROPOSED AUTOMATIC DOCKING SYSTEM

A robust pose estimation can be determined by computervision algorithms processing imagery from one or more cam-eras on one of the spacecraft viewing markers mounted onthe other craft. The 6-DOF (degree of freedom) pose can beconverted by a simple position offset and rotation to determinethe pose error between the incoming craft and where it wouldbe if correctly docked. ARTag markers can be mounted inunused spots on the first spacecraft (space station, satellite,etc) with different sizes to allow detection over a long rangefrom far away to very close. The markers would be spread outso that when the docking craft is near, markers extend acrossthe field of view to give good pose accuracy.

For each marker detected, it is matched to its known 2D or3D location known a priori, providing 4 point correspondencesbetween the image and the world. Let � � � � �� be themarkers corner point if planar, or � � �� be a pointin 3-space if the markers are not co-planar, and � � �� beits image location adjusted for non-ideal lens parameters (mostcommonly radial and thin prism distortion). In the coplanarcase � is related to the world coordinates � by a homography� (Eqn. 1), whereas in the non-coplanar case � is related tothe world coordinates � by a projection matrix �� .

� � �� (1)

� � �� (2)

� �

��

� ��

�� (3)

� � (4)


38

��

��

�

��

��

�

�(5)

� � � (6)��

��

��

�

��

��

��

�

�(7)

Each adjusted point provides two equations in a linear ex-pansion of � or �. If matrix� is rewritten as a 9-dimensionalvector � ��

� theproblem can be expressed as finding to satisfy Eqn. 4,where is a matrix with � rows and � columns. is thenumber of marker corners matched between the image and apriori model, is a multiple of 4 since every ARTag markerprovides 4 correspondence points. Each correspondence pointconsisting of ��

� and �� creates two

rows in (Eqn. 5). Eqn. 4 is solved by the method of leastsquares by applying singular value decomposition (SVD) to�. Similarly, � is rewritten as an 11-dimensional vector to solve Eqn. 6 by SVD on ��. Two rows in � arefilled (Eqn. 7) by each correspondence ��

� and��

�.Thus from a set of image-to-2D or image-to-3D point

correspondences, a matrix � or � can be found for that imageframe. If the camera parameters (�) are known a priori, as isassumed in a spacecraft, the inverse matrix �� can be pre-multiplied by � or � to extract pose � �� (rotation matrix�, translation vector �), � (Eqn. 3) contains the focal lengths�� and ��, skew factor , and principal point �� .

As with camera calibration algorithms, the pose couldbe adjusted by a few non-linear iterations that attempt tominimize the reprojection error. Good results were obtainedwithout this stage, but it may be desirable for a real system.

A system can be based solely on a planar array, or couldbe a hybrid system where the pose is determined by either bythe homography or projection matrix calculation. A decisioncriteria has to decide which to of the two to use, a possiblescheme is to decide if the 3D �� points of the correspondenceset found in an image can be fit to a plane within a threshold.This could be done by eigen-plane methods. This warrantsfurther investigation as that there may be issues of poseconsistency when switching between planar and 3D poseestimation.

Two prototype systems are shown, one of a planar array(Fig. 5) using Eqn. 4 to determine pose and a 3D arrangementof markers (Eqn. 6). The planar docking system was createdusing a 43 cm wide array and allowed pose extraction in arange of 10cm to 7 metres, the largest markers were 14 cmacross which corresponds to a marker width of 22 pixels. A 3Dsystem was made with a satellite mock-up 25 cm wide withmarkers placed around the exterior. The marker positions wereautomatically determined with bundle adjustment from a set ofimages (see [7]), however this was done for convenience and inlikely in a space system these would be measured accuratelyinstead. The ”satellite’s” pose was detected up to 4 metresaway, the largest markers were 7 cm across corresponding toa marker width of 19.5 pixels in the image. The ARTag marker

system was measured in [8] to be detectable down to about 15pixels with the same camera. The camera used in both caseswas a greyscale digital video camera of resolution 640x480pixels and a 8mm lens providing a focal length of 1100 pixels4.

Fig. 5. Prototype coplanar array of ARTag markers, the detected cornersfor each marker in a camera image are combined with the known spatialcoordinates of the marker to calculate a homography and determine pose.

The software ran with a frame rate of greater than 10 fpson a 3.0 GHz Pentium 5 PC. The not quite real-time speedperformance is partly due to the non-ideal software and devicedriver infrastructure, and could easily run at 60 or 100 fps withthe ARTag algorithm optimized into a DSP or FPGA.

Some screen shots are shown below in Figs. 7, the planarsystem has the 6-DOF information superimposed. In a realsystem the full 6-DOF pose would be used to control theincoming craft, but for visualization this prototype systemprovides a control signal of the greatest error component. Todemonstrate the system, a user can watch these control signals(arrow at middle bottom of screen shots) alone and guide thecamera to the center point of the array.

As mentioned, in a real system the ARTag markers wouldbe mounted all over at least one of the two spacecraft, in adhoc non-coplanar positions. The 3D position of the markerscan be measured by taking many images of the space stationor satellite and automatically creating the 3D coordinates ofeach marker corner. This was demonstrated for a non-spaceapplication with many ARTag markers placed on an irregularcurved object in [7].

In general, the more cameras that are used, the lower theundesirable probability that many false negatives can createmoments when no pose information is available (most likelycaused by a camera being over-saturated or recovering fromover-saturation such as seeing the sun). The marker’s corner

4IEEE-1394 Dragonfly Camera from Point Grey Research


39

Fig. 6. (Top and Bottom) ARTag marker array detected at 6.7 and 0.1 metresrespectively.

Fig. 7. Satellite mock-up. 3D arrangement of markers, pose is found by firstcalculating a projection matrix.

results would be combined from all the cameras in a singlepose determination. Also the system can be enhanced byplacing cameras and markers on both spacecraft, and withcommunication between the crafts an even more reliable andaccurate pose could be determined.

IV. CONCLUSIONS

The pose determination for an automatic computer visionbased spacecraft docking system was proposed, and a proto-type system demonstrated. The system uses fiducial markerarrays mounted on surfaces on one craft viewed by a camera(or cameras) on the other craft. Such a system can be used inthe final tens or hundreds of metres of a docking procedureafter RADAR and LIDAR systems have brought the space-crafts within a hundred metres of each other. Computer basedfiducial marker systems were described and all known suchsystems described and compared. The ARTag fiducial markersystem was chosen for its low false positive, false inter-markerconfusion, and false negative rates as well as its immunity tolighting variation and partial occlusion. Overall, a robust andfully automatic pose determination system using low cost andaccurate computer vision algorithms was created addressingthe needs of very high reliability for space applications.

REFERENCES

[1] http://studierstube.org/handheld ar/artoolkitplus.php (artoolkit plus web-page).

[2] R. Bencina, M. Kaltenbrunner, and S. Jorda. Improved topologicalfiducial tracking in the reactivision system. In Proceedings of the IEEEInternational Workshop on Projector-Camera Systems (Procams 2005),San Diego, USA, 2005.

[3] P. Boulanger. Application of augmented reality to industrial tele-training.In Proc. of CRV’04 (Canadian Confernence on Computer and RobotVision, pages 320–328, May 2004.

[4] D. Claus and Fitzgibbon. Reliable fiducial detection in natural scenes.In Proceedings of the 8th European Conference on Computer Vision,Prague, Czech Republic, May 2004.

[5] H. Yamamoto D. Kotake, S. Uchiyama. A marker calibration methodutilizing a priori knowledge on marker arrangement, Nov. 2004.

[6] M. Fiala. Artag, a fiducial marker system using digital techniques. InCVPR’05, pages 590 – 596.

[7] M. Fiala. The squash 1000 tangible user interface system. In 2005IEEE / ACM International Symposium on Mixed and Augmented Reality(ISMAR 2005), Vienna, Austria.

[8] M. Fiala. Artag revision 1, a fiducial marker system using digitaltechniques. In National Research Council Publication 47419/ERB-1117,November 2004.

[9] M. Fiala. Fiducial marker systems for augmented reality: Comparisonbetween artag and artoolkit. In MIRAGE 2005, INRIA Rocquencourt,France, Mar 2005.

[10] L. Naimark and E. Foxlin. Circular data matrix fiducial system androbust image processing for a wearable vision-inertial self-tracker. InISMAR 2002: IEEE / ACM International Symposium on Mixed andAugmented Reality, Darmstadt,Germany, Sept. 2002.

[11] J. Rekimoto. Matrix: A realtime object identification and registrationmethod for augmented reality. In Proceedings of 3rd Asia PacificComputer Human Interaction. (APCHI ’98), pages 63–68, 1998.

[12] J. Rekimoto and Y. Ayatsuka. Cybercode: Designing augmented realityenvironments with visual tags. In Proceedings of DARE 2000 on De-signing augmented reality environments, pages 1–10, Elsinore, Denmark,Oct 2000.

[13] X. Zhang, S. Fronz, and N. Navab. Visual marker detection and decodingin ar sytems: A comparative study. In IEEE and ACM InternationalSymposium on Mixed and Augmented Reality (ISMAR), pages 97–106,Sep 2002.


40

End Notes and Survey

Current and future challenges in space robotics and vision

Leo Hartman and Ioannis Rekleitis

Canadian Space Agency [email protected], [email protected]

6767 Airport Rd., St-Hubert QC J3Y 8Y9 Canadawww.space.gc.ca

Abstract

The Canadian Space Agency is involved in several robotics activities including on orbit servicing andassembly and planetary exploration. With Nasa’s Vision for Exploration likely to dominate the globalspace sector for the coming decades, the importance of and our dependence on robotics, and therefore,vision and other wide area sensing technologies, for reliable and lower cost missions will onlyincrease. This presentation will provide a short overview of CSA’s current robotic challenges and, byway of summarizing the workshop, the possible challenges and opportunities of the future.

Biographies

Leo Hartman received his PhD in Computer Science from the University of Rochester in 1990. Hisdissertation investigates the use of decision theory for allocating computational resources in deductiveplanning. After a post-doc at the University of Waterloo, he joined the Canadian Space Agency in1993 as a research scientist. His work there focuses on computing, networking and automation forspace missions.

Ioannis Rekleitis is currently a visiting fellow at the Canadian Space Agency and adjunct Professor atthe School of Computer Science, McGill University. His work is focusing on developing softwaresystems for autonomous planetary exploration and On-Orbit servicing. He also works on sensornetworks and distributed robotics. During 2002-2003 he was a post-doctoral fellow in CarnegieMellon University with Prof. Howie Choset working on multi-robot coverage and single robotexploration. He obtained his Ph.D. and M.Sc. degrees from the School of Computer Science, McGillUniversity, in 2002 and 1995 respectively. His Ph.D. work was on ‘‘Cooperative Localization andMulti-Robot Exploration’’, Supervisors: Gregory Dudek and Evangelos Milios. His M.Sc. thesis wastitled ‘‘Optical Flow Estimation by Analysis of Motion Blur’’, Supervisors: Godfried T. Toussaintand David Jones. He has a B.Sc. degree from the Department of Informatics, Faculty of Sciences,University of Athens, Greece at 1991. His interests are in robotics, computer vision, artificialintelligence, sensor networks, multi-robot systems, and software architectures for complex systems.


41

Documents

Robot Vision for Space Applicationsrobvis05/ProceedingsCom...IEEE/RSJ IROS 2005 Workshop on Robot Vision for Space Applications Aug 2, 1:30 PM to 5 PM in Salon 15 Shaw Conference Centre,