Sensor-Based Behavior Control for an Autonomous Underwater ...people.clarkson.edu/~jsattar/allpapers/ijrr2009.pdf · Sensor-Based Behavior Control for an Autonomous Underwater Vehicle

Sensor-Based Behavior Control for an AutonomousUnderwater Vehicle

Junaed Sattar, Philippe Giguere, and Gregory Dudek

Centre for Intelligent MachinesMcGill University{junaed, philg, dudek}@cim.mcgill.cahttp://www.cim.mcgill.ca/˜mrl

Abstract. In this paper, we evaluate a set of core functions that allow an underwater robot toperform surveillance under operator control. Specifically, we are interested in behaviors thatfacilitate the monitoring of organisms on a coral reef, and we present behaviors and interactionmodes for a small underwater robot. In particular, we address some challenging issues arisingfrom the underwater environment: visual processing, interactive communication with anunderwater crew, and finally orientation and motion of the vehicle through a hovering mode.The visual processing consists of target tracking using various techniques (color segmentation,color histogram and mean shift). Communication underwater is achieved through printed cardswith robustly identifiable visual markers on them. Finally, the hovering gait developed for thisvehicle relies on the planned motion of six flippers to generate the appropriate forces.

1 Motivation and Problem Statement

This paper considers non-contact guidance for an amphibious robot, based on visualcues. We are developing basic robotics and sensing technologies, but our specificapplication domain in the short term is environmental assessment of marine life oncoral reefs. This paper builds on our earlier work [Dudek et al.2006], by elaboratingon the details of the visual guidance system and also introduces a novel technique forvisual human-robot interaction. At present, visual measurements of marine life aretypically acquired manually by biologists who “swim the reef” repeatedly makingobservations. This is both tiring, error prone and dangerous. By providing roboticassistance, or letting the robot carry out some of these data-gathering missions,it should be possible to acquire more reliable data with improved efficiency andreduced risk. To do this, the robot needs to communicate with a diver and interactwith them. This drives the development of fundamental competencies includingvisual guidance. In particular, this is motivated by the challenges of communicationunderwater. Teleoperation for underwater vehicles is complicated by several factors:

• Wireless radio communication is impossible. Conventional radio communica-tions are infeasible since the energy of the beam is rapidly absorbed in conductivesalt water.

2 J. Sattar and P. Giguere and G. Dudek

Fig. 1. The AQUA robot being deployed during open-sea experimentation (on the left) andoperated in untethered mode (on the right).

• Difficulties in using a tether. The use of a tether is awkward both on land andunderwater, since it requires managing the deployment of the cable over somedistance (200 m for example) while minimizing the disturbance of the tether pullon the vehicle in the face of buoyancy issues and 6 degrees-of-freedom (DOF)motion.

• Bandwidth limitation of acoustic modems. Acoustic modems currently deployedin marine systems have severe bandwidth limitations (in the range of 1-10 Kb/s)while requiring significant power to transmit over long distances. This smallbandwidth makes the transmission of video information in real-time impossible.

Notably, even human scuba divers commonly resort to simple sign language andsimilar short range visual communications for underwater task coordination. In asimilar manner, our underwater vehicle is being developed to combine behaviorsand operating modes based on visual cues.

Our target application is the visual surveillance of reef environments to assessthe health and behavior of the marine life. This task, like many related inspectiontasks, can be decomposed into two canonical behaviors: transition between waypoints, and station keeping at a fixed location. In our application, each of these ismodulated by visual cues from a diver or in response to environmental stimuli (suchas recognized visual landmarks). These cues take two forms: symbolic tokens usedto select specific behavior classes or tune the behaviors, and target motions used forvisual servoing with respect to either a diver or an environmental stimulus. Motionbetween the way points is performed by executing one of several swimming gaits.Station keeping, however, entails the use of a hovering gait which is both unique andchallenging.

Our vehicle, a descendant of the RHex hexapod robot [Altendorfer2001] haswell-developed “kicking” gaits for forward locomotion that also permit pitch, rolland yaw rotations. These gaits are based on simple oscillatory motions of the flipperswith various phase and amplitude offsets, akin to the standard up-and-down kick ofa human swimmer. In this standard mode of motion, however, rotational motion iscoupled to forward motion and thus the robot can only turn if it is moving forward.

Sensor-Based Behavior Control for an AUV 3

Furthermore, thrust can only be “instantaneously” applied in the forward direction(or backward if the flipper orientation is reversed).

Visual servoing for station keeping has been implemented in the past for au-tonomous underwater vehicles [Balasuriya and Ura2001] [Rives and Borrelly1997a][Rives and Borrelly1997b] [Lots et al.2001]. In such prior applications, the visionsystem uses (simpler synthetic or engineered) line or circular features to establisha baseline reference, and the control loop maintains vehicle pose by continuouslyminimizing the error function. A notable system that did this was Twin Burger robot.A cable-following maneuver has been successfully implemented on-board the TwinBurger underwater vehicle. In this particular system, the visual processors on-boardthat robot extract only line features from the grayscale image using a Hough trans-form. These features are not sufficiently robust to track natural targets such as thehuman divers we interact with. In that prior work, to reduce the chances of falsetracking the robot pose estimate from dead reckoning is used to ensure that thecorrect cable is being followed. In that case, however, the target is stationary and hasno motion of its own. Also, the input images are grayscale, and no color informationis used for visual servoing. work on visual tracking for underwater vehicles, to ourknowledge, does not entail a statistical or probabilistic component.

In our work, the robot is equipped with a visual servoing mechanism for fol-lowing moving natural targets. We impose no constraints on the size or shape of thetarget. In particular, since we wish to track humans, it is critical that the system berobust enough to track non-rigid objects (which we do). The tracking algorithms arecolor-based, and multiple algorithms are used that employ techniques such as thresh-olding and statistical estimation. Our vehicle is actuated by flippers, and the PIDcontrol algorithm is designed to generate control commands for an 18-dimensionalspace (offset, amplitude and period parameters for each of the six legs), makingit significantly complicated than generating thruster forces. Our robot is capable ofamphibious operations, and the versatility of the visual servoing architecture enablesit to be applied for servoing during terrestrial operations.

The hovering gaits were conceived with several requirements in mind. First thosegaits had to be able to move the robot in five degrees of freedom: pitch, roll, yaw,heave and surge. They also had to be able to combine several commands at the sametime, for example a pitch and heave command, to be useful during station-keeping.Furthermore to ensure good controllability, the reaction time of the robot had to beas short as possible, particularly in the case of command reversal. Finally, the cross-talk between the degrees of freedom had to be minimized: in order to hover in placeeffectively, the robot needs to be able to apply rotational moments in any directionwith very limited net forward motion. The intrinsically non-holonomic behavior ofthe flippers presented a significant challenge in the design of those hovering gaits.

2 Technical Approach

In order to adapt to natural environments and compensate for unforeseen forces inthe environment, a key aspect of our work is the sensing of environmental conditions


and the recognition of the current environment. To do this we use a combination ofinternal sensors akin to biological proprioception as well as computer vision. Thisadaptation process therefore falls into two distinct categories: visual processing andinterpretation, and gait synthesis, selection and control. The visual processing isfurther subdivided into learning-based visual servoing, and symbol detection andinterpretation (i.e. a visual language akin to a simplified sign language).

2.1 Visual Processing and Interpretation

A primary interaction mechanism for controlling the robot is via visual servoingusing a chromatic target. By detecting a target object of a pre-specified color, therobot is able to track the target in image-space up to a nominal maximum distanceof approximately two meters [Sattar et al.2005]. This limitation on the distance isprimarily due to color absorption and diffusion in sea water, particularly in the redarea of the spectrum. We currently use color segmentation, color histogram andmean-shift [Comaniciu et al.2003] based tracking algorithms for target tracking.A proportional-integral-derivative (PID) controller takes the tracker outputs andgenerates yaw and pitch (but not roll) commands which determine the robot trajectoryand makes the target following behavior possible. For the surveillance of a motionlesstarget, the hovering gait of the robot is used together with the visual tracker.

Fig. 2. ARTag Markers.

The servoing mechanism is configurable through a number of parameters thatinclude target color properties, robot speed, gait selection and yaw/pitch controllergains. We use symbolic markers provided by the ARTag toolkit [Fiala2004] tovisually communicate with the robot and affect changes in robot behavior. Examplesof ARTag markers are shown in Fig. 2. These markers include both symbolic andgeometric content, and have an embedded error-correcting mechanism to enhancerobustness. Switching in and out of the hovering gait, for example, is performed bydetecting a particular ARTag marker.

2.2 Gait Control Overview

The gait design and control issues we consider are for a swimming robot that usesflippers (i.e. legs) for locomotion underwater. By using legs for locomotion, ourvehicle is able to exhibit amphibious behaviors, being capable of swimming under-water and walking on land. This “walking” behavior can also be used underwater,


by using the flippers as resting support on the sea bottom to provide for limiteddisturbance surveillance tasks. A large class of activities is facilitated, however, bybeing able to hold a fixed position and attitude in middle depths. Monitoring sea lifeon a coral reef, a key application of our robot, would be a prime example of this.

3 Methodology and Results

Our experimental methodology is comprised of three sequential phases: numericalsimulation and validation on recorded video data, pool trials in a controlled environ-ment, and open water tests on a live coral reef. The latter scenario is the real test ofperformance, but tends to provide qualitative results and video data. The deploymentof the robot at sea can take place from shore or from a boat.

3.1 Visual Servoing and Tag-based Control

The integrated monocular vision system in the Aqua robot currently serves twopurposes: visual servoing and tag-based robot control. The visual servoing behavioris used to track and follow an object of interest (i.e. a fish, a diver etc) underwater(although the same mechanism will work on terrestrial environments as well). Thetag-based navigation mode is used to send basic motion control commands to therobot. These tags are used to modify the behavior of the robot, and could be used toassist in a context of behavior selection such as described in [Benjamin et al.2006].In the context of reef monitoring, this visual language scheme is useful for the diverto instruct the robot to perform certain motion trajectories (i.e. reef transects) that areessential for marine habitat inspection. In the first stages of the Scene Acquisitionand Re-inspection (SAR) task, the robot will need explicit guidance from a humanoperator to acquire landmarks of interest, and the tag-based control provides a directway for the diver to interact with the robot to achieve this goal. Both these modes runsimultaneously; visual servoing mode can be preempted (i.e. interrupted or assisted)by motion control commands sent by the tag-based motion control subsystem. Visionprocessing is performed on board the robot; a fiber optic tether is nevertheless usedto collect video data (which can require significant storage) off-board, and provideexternal driver assistance if required. We discuss both these systems in the twosubsections that follow.

3.2 Visual Servoing

The visual servoing subsystem is made up of two functional components – the visualtracker and the PID controller (see Fig. 6). The visual tracker follows objects ofinterest, or targets, based on the color features of the targets. The tracking systemlocalizes a target in image space with the null location (0,0) being the center of theimage. To track an object, we use its color properties. Both single and multicolorobjects can be tracked. Currently, our system combines two different methods forvisual tracking. One approach is a naive, brute-force technique for target localization.


An example of this is the color segmentation tracker that uses threshold-based colorsegmentation to detect the target in the image frame. We employ a statistical approachto visual tracking as well. A color-histogram based tracker and mean-shift trackeruses statistical similarity measures between color histograms to detect probable targetlocation between successive image frames. These trackers differ in their methods oflocating the target in consecutive frames; the histogram tracker does a global search inthe entire image to locate the target, whereas the mean-shift tracker uses the “mean-shift” vector to detect the displacement in target location and thereby re-acquirethe target in the subsequent frames. We have performed quantitative evaluation ofthe performance of the three types of trackers used [Sattar and Dudek2006]. Thefollowing sections discuss the details of the tracking algorithms.

3.3 Color Segmentation Tracker

The color segmentation (“blob") tracker uses color threshold values to detect thepresence of the target object in image space. We use normalized-RGB color spaceto perform segmentation; N-RGB is less susceptible to variations in lighting andthus makes tracking more robust in underwater environments. The color propertiesof the image are provided to the tracker in the form RGB-pair-triplets, with lowand high threshold values for each channel (see Tab. 1). In successive frames, thetracker produces a binary image from the input, where each pixel lying within thetarget color thresholds are turned on, the others remain off. The image thus producedcontains approximately the target mass, and the center of mass of this “blob” is takenas the new location of the target. An example of an input frame and the resultingsegmented output is shown in Fig. 3.

Channel Low HighRed 0.486135 0.526135

Green 0.421718 0.461718Blue 0.0321472 0.0721472

Table 1. Color threshold examples

The obvious disadvantage of using a naive blob tracker is that it can easilyget confused if another object with similar color characteristics as the actual targetappears in the robots field of view, as can be seen in Fig. 3(b). In such cases, the colorsegmentation process will create two (or multiple) blobs and thus the center of massof the blobs will not be on the actual target. Use of priors such as motion modelsand spatial information can reduce the chances of mistracking. Eliminating suchconfusion with naive blob trackers, however, is a daunting task. Another problemwith blob trackers is that they are inherently suitable for tracking objects with auniform color property, i.e. monochromatic objects. A colorful object would bedifficult to track in the same approach, since there is no single color threshold thatwould describe the color properties completely. These shortcomings forced us tolook for more versatile and robust tracking algorithms.


(a) Input image

(b) Blob tracker output

Fig. 3. Color blob tracking of the yellow flippers.

3.4 Color Histogram Tracker

To address the disadvantages of the blob tracker, we looked at statistical prop-erties of object color beyond the trivial threshold values. One approach that hasbeen proven successful in video indexing and content-based image retrieval fromlarge databases [Swain and Ballard1991] is the use of color histograms. A colorhistograms represents the distribution of color across images. In the basic histogramtracker, we use a color histogram of the target object in question, by creating a squarewindow with fixed side length and creating a histogram of the image area fallingwithin the window. In the detection phase, we split the image up in slices, (as inFig. 4) where each slice is of the same size as the original target histogram. We thensequentially run a matching process, where at each step, we match the target modelhistogram with the histograms created from each of the slices. The slice which hasthe maximum similarity with the model histogram is chosen as the probable locationof the target object.

The question of measuring ‘similarity’ between discrete statistical distributionssuch as histograms is a well-studied area [Fukunaga1990,Cha and Srihari2002].Many different proposals have been made to measure similarity (or conversely,distance) between distributions, many of which have shown good performance incomputer vision applications such as visual tracking and image retrieval. We have


Fig. 4. Color histogram tracking: Sliced image.

chosen the Bhattacharyya distance [Bhattcharyya1943] to calculate the degree ofsimilarity between color histograms. The Bhattacharyya distance provides boundson the Bayes misclassification of probability; i.e. it is the probability of one clusterbeing similar to the other. Geometrically, the Bhattacharyya measure (also knownas the B-measure in literature) is the cosine of the angle between two distributionin N -dimensions where N is the number of bins in each of the histograms. If twohistograms hx and hy each has n bins, then the Bhattacharyya distance betweenthem can be expressed as:

ρBhattacharyya =√

1− ρ(hx, hy) (1)

ρ(hx, hy) =∑i

√(hx(i)hy(i)) (2)

The calculation of Bhattacharyya measures at each image slice is quadratic inthe number of bins per histogram. The number of bins is a trade-off parameter thateither gives finer degrees of similarity measures, or faster computation at the expenseof coarser levels of comparisons.

Fig. 5 shows a target object and the histogram-tracker picked image slice, andtheir corresponding histograms. The Bhattacharyya measure and value of ρ is givenin the caption (on a scale of 0 to 1). The lower the Bhattacharyya distance, the moresimilar the histograms.

We use the output of the color blob tracker in concert with the histogram trackerto assure that the target has correct scale. In addition, the often-uniform backgroundfound underwater makes the histogram tracker quite robust to changes in scale.


(a) Target image (b) Candidate image

(c) Target Histogram (d) Candidate Histogram

Fig. 5. Histogram tracking: ρx,y = 0.95 and ρBhattacharyya = 0.22

3.5 Mean-Shift Tracking

While an improved algorithm than the naive blob tracker, the histogram trackersuffers from the approach that it scans an entire image, and creates histograms ofeach slice before a match can be made. This gives the algorithm a cubic runningtime (O(n3)), which can be expensive for real-time operations. To address this issue,we exploit the spatial locality property of real targets; i.e. the translational distancebetween frames is usually small compared to the field of view of the robot. Thisproperty is even stronger in case of underwater objects, since the high viscosityof water limits translational velocities, making this domain suitable for applying agradient-based method like mean-shift tracking. Instead of scanning the entire imagefor the probable target location, we use the mean-shift tracker locally; i.e. we utilizethe position information from the previous frame to restrict our search within a smallarea of the entire frame.

Mean-shift tracking is an iterative process, where each iteration consists ofhistogram computation, similarity measure, and finding the new (probable) targetlocation using the mean-shift property. The reader might observe that the processis very similar to that of the histogram-based tracker described earlier; that in-deed is a correct observation. The mean-shift tracker has a performance advantageover the aforementioned tracker since only a small area of the entire frame isever scanned and histograms are computed. Mean-shift is itself an iterative proce-


dure [Comaniciu and Meer1999] that shifts each data point to the average of datapoints in its neighborhood. By weighting neighboring pixels using a kernel function,the resulting gradient information is used to ‘shift’ the estimated position of thetarget object between successive frames. The most likely location is chosen by de-tecting substantial changes in color and by maximizing the similarity between colorhistogram models of candidate locations. Like before, we use the Bhattacharyyameasure as a distance/similarity metric for color histogram comparisons.

3.6 PID Control

We use a PID controller that takes as input the image coordinates (and size) of thetarget in image space, and computes yaw and pitch commands (and optionally speedcommands) as outputs that are sent to a auxiliary gait-control computer to modifythe robot’s behavior and thus pose. The controller also embodies a low-pass filter,smoothing out random changes in yaw and pitch commands that are sent from thevisual tracker. Since the robots swimming gaits produce a low-frequency oscillatorybody motion, these motions need to be accounted for in the tracking stage and gaittuning need to be made to avoid disrupting the ongoing gaits. An overview of thevisual servoing architecture can be seen in Fig. 6.

Fig. 6. Visual servoing architecture in AQUA.

In Aqua, for each of the yaw and pitch commands, there are individual PIDcontrollers. Each controller has unique set of nine parameters, which include pro-portional, integral and derivative gains and their limits, width of the time windowfor the integral-controller, the dead-band value (i.e. the change in input value neededto trigger an output change from the controller) and the output limit.

3.7 Tag-based Navigation

Control of the swimming robot typically entails manipulation of various parametersspanning power control (on/off), gait selection, gait tuning and behavior control.As noted above we have developed an interface mechanism to allow a diver tocontrol the robot without recourse to a tether or cumbersome or costly consoles. Thetechnique is based on the use of robust visual targets (called “ARTags”) than can


be manipulated and used to iconically express commands to the robot. The ARTagdetector system operates in parallel to the servoing system and either can be usedor suppressed as needed. In practice, ARTag (icon) detection is performed every 2to 3 seconds at present. This latency makes the overhead in tag image processingminimal, and is sufficient given the relatively slow motions of the divers. When a tagthat corresponds to a motion command is detected, for example, the vision systeminterpolates the command into any ongoing servo-control stream (accounting fordynamics and gait changes) and sends the appropriate yaw and pitch commands tothe gait controller to achieve the desired behavior. Currently, the primitive motioncommands included are pitch up and down, yaw left and right, and start and stop. Themapping from a tag number to motion commands is preset beforehand and cannotbe modified once the system is in operation.

The icon interpreter also includes the ability to express complex control se-quences. We have developed a visual language using the ARTag icons that wecoined RoboChat (although dependence on the ARTag system is purely for conve-nience – the language semantics are independent of the token system being used).The language specification allows presentation of complex command sequences tothe robot. The sequences may include a collection of motion commands to make therobot swim over a specific region of a coral reef, for example. A diver or operatorwith a tag dictionary may also modify the operating parameters of the robot usingRoboChat. Herein lies two of the competing priorities that arise when designing ascheme for gestural control for robots – the language must have a small vocabularyset, and also must be expressive enough to command the robot using a small set oflanguage tokens. RoboChat aims to address these issues by using a small set of basictokens to construct the language core, but also provides IF. . . ELSE and iterativecontrol loops as well as macros. Arguably the most important feature currently isthis ability to record macros; this way, a diver can show a set of commands to therobot, save them in the language buffer and reuse it at a later time without needing toshow all the tags again. This feature can be very useful in underwater environments,since the cognitive load on the diver is significantly reduced for repetitive tasks. Wehave performed a preliminary set of human interface studies [Dudek et al.2007] inan attempt to quantify the performance benefits of ARTag-based RoboChat system

Fig. 7. Visual servoing off a yellow target, open ocean trials.


(a) RoboChat BNF grammar (b) Tools used as tag dictionary

Fig. 8. RoboChat grammar and tag presentation tools.

against a simple hand-gestured signaling scheme, for example those used by scubadivers. Two different vocabulary sizes and different commands were used to measurethe correctness and speed of access of the gesture mechanisms. A small subset ofthe hand signals used in the experiment can be seen in Fig. 9.

The initial system contained a minimalistic way to modify the duration of themotion command sent through the ARTags. We perform simple addition on twonumbers ranging from 1 to 10, identified by their tag numbers. A sum of the twodigits is taken when a third digit is detected. This new value is the duration ofthe motion commands, until another arithmetic operation is performed to changethe command length. By default, the robot is programmed to perform a maneuveras directed by a tag for 3 seconds. While this method of showing numbers andperforming basic addition to change the duration is inherently simple, it demonstratesthe programmability of the system by showing tags to modify internal parameters

Fig. 9. A small subset of hand gestures used in user interface studies of RoboChat (imagecourtesy of Anqi Xu).


of the robot controller. Fig. 10 shows a diver commanding the robot to turn right byshowing the appropriate ARTag.

3.8 Hovering Gait ImplementationA key function for our system is video acquisition, both while the robot is movingas well as at a fixed location. For in-place surveillance the robot can land, but it hasproven valuable to implement a novel hovering gait. This novel gait can be used forstation keeping in middle depths, as well as providing a way to execute low-speedprecision maneuvers. The latter can be used when following a slow target, doingvisual inspection, when the vehicle is to dock to or exchange data with a sensormodule [Dunbabin et al.2006a], or when collaborating with another vehicle at lowspeed [Dunbabin et al.2006b].

The gait implementation is divided in two component modules. The first module(Vectored-Thrust Computation Module) computes the required thrust at each flipper,based on the five possible commands (pitch Cp, roll Cr, yaw Cy , heave Ch andsurge Cs). It can be seen as the higher level of the implementation of the gait. It isalmost completely independent of the actual thrust generation mechanism, as it isderived from simple free-body dynamics (the only dependency is the assignation offlipper to certain forces, see Tab. 2). The second module (Flipper Thrust Module)computes the individual flipper motion needed to generate the desired thrust vectors,and is therefore tightly coupled with the physics of the system (the flippers and fluidinteractions, namely).

Vectored-Thrust Computation Module To satisfy the constraints of having de-coupled moments, quick command reversal as well as simultaneous commands exe-cution i.e.a pitch and a yaw commands at the same time, each flipper was restrainedto a set of commands it can execute. This represents the compromise between usingmore flippers to increase the net values of the forces generated and keeping someflippers available to allow certain command combinations. For example, the middle

(a) Robot’s eye view, showing ARTag iconand human-readable translation

(b) External view with diver holding tagdictionary

Fig. 10. Tag-based navigation in action.


Fig. 11. Flipper placement, orientation and numbering convention. Note that flippers 0 and5 are facing forward to provide an extended moment arm and symmetric pitching momentswith flippers 2 and 3, as well as to be able to quickly provide a reverse surge.

flippers are set aside for yaw and surge commands. Table 2 shows the completeflippers assignation for all net commands.

Flipper Pitch Roll Yaw Heave SurgeFore L (0) +/- +/- + +/- -Fore R (5) +/- +/- - +/- -Mid L (1) - +Mid R (4) + +Aft L (2) +/- +/- - +/- +Aft R (3) +/- +/- + +/- +

Table 2. Flipper assignation for net thrust commands generation. +/- indicates that both thrustdirections are possible, while a single + or - indicates that only a positive or negative net thrustgeneration will be executed by the flipper. The number in brackets matches the numberingconvention of Fig. 3.8.

Mathematically, this module can be expressed as follows. LetC = [Cp Cr Cy Ch Cs]T

be the input command column vector. Let Fx and Fz be the column vectors repre-senting the desired thrust at each flipper location in the x and z direction (given theaxes of rotation of the flippers, no thrust can be generated in the y direction). TheVectored-Thrust Computation formulation can be expressed as:

Fx =

0 0 −ky 0 ks0 0 −ky 0 ks0 0 −ky 0 ks0 0 ky 0 ks0 0 ky 0 ks0 0 ky 0 ks

×C, Fz =

kp kr 0 kh 00 kr 0 kh 0−kp kr 0 kh 0−kp −kr 0 kh 00 −kr 0 kh 0kp −kr 0 kh 0

×C

The constants kp, kr, ky , kh and ks are used to scale the input so that the absolutemaximum value of the output is less than or equal to 1. The size of the column


vectors Fx and Fz is equal to the number of flippers on the robot (six). The indexrepresenting the flippers is shown in Fig. 3.8.

The selected thrust angle and magnitude for flipper n thus:

θn = arctan(FznFxn

), Tn =√Fxn

2 + Fzn2 (3)

Flipper Thrust Module The most efficient way to generate thrust in direction θrwith our flat flipper is to make it oscillate rapidly (period of 250-500 ms) around thatangle θr. The magnitude of the thrust is approximately proportional to the amplitudeof the oscillation. The flipper angle θf over time is thus:

θf (t) = Tnsin(ωf t+ φf ) + θr (4)

Where ωf is the fixed frequency of oscillation and a phase shift φf is used betweenflippers.

One significant issue for generating the thrust this way is the holonomic planningconstraints on the flippers with respect to thrust direction: the flippers must be firstoriented at the new desired thrust angle before they can provide useful thrust. How-ever, flipper re-orientation produces a force normal to its surface through pressuredrag. For example, if a flipper is pointing forward but forward thrust is needed, thenthe flipper must first be rotated 180o. This rotation generates parasitic forces withorientation depending on the direction of rotation. Moreover, this also implies delaysin the execution of commands. To mitigate these problems, we first limit the range ofthrust angles for each flipper to a region of 180o (see Fig. 12(a).), as is done similarlyin [Kemp et al.2005]. This reduces the average reorientation angle responsible forthe parasitic forces, at the cost of reduced maximum vehicle thrust. For example, thefront flippers are not used when a forward thrust is commanded, thereby reducingthe maximum possible forward thrust for the vehicle.

(a) Flipper angle limitations. (b) Forces generated by a flip-per.

Fig. 12. (a) Flipper angle limitations for flippers 3, 4 and 5. Flipper 5 will be generatingbackward thrust, while 3 and 4 be generating forward thrust. (b) Schematic force diagram forone flipper.

To further improve the reaction time of the robot, we make use of the pressuredrag forces generated when the flippers are re-oriented. When the difference between


the desired thrust angle and the current flipper angle is greater than 45o, the flipperis rotated at a rate that generates a pressure drag consistent with the desired thrust Tcvia the constant KPD. As the flipper surface passes the 45o region, the oscillationamplitude is increased until it reaches its selected amplitude as given by Eq. 4.Using discrete-time equations and letting θr be the ramped value of the computedthrust angle θc and magnitude Tc, Eq. 5 and Eq. 6 show our complete Flipper ThrustModule. The ramp(rate, a, b) function will ramp value b toward a at a constant raterate.

θr[t+ 1] ={θr[t] if θc[t] outside thrust rangeramp(KPDTc, θr[t], θc[t]) otherwise (5)

θf [t+1] ={θr[r] if |θc[t]− θf [t]| > 45o

Tc|θr[t]−θf [t]|

45o sin(ωf t+ φf ) + θr[t] otherwise(6)

Finally, to improve slow-changing commands, when the demanded thrust Tcreaches zero, the flippers are gradually moved back to the stand-by range of 90o (seeFig. 12(a)). This guarantees that the flippers are always able to generate the properthrust rapidly, by making the flipper surface or its normal no more than 45o awayfrom any desired thrust within the 180o window.

4 Results and ExperimentsWe have evaluated the visual signaling and servoing subsystem on land, in simulatedenvironments, and also in closed and open-water trials (i.e. in the pool and in theopen ocean). The open ocean servo trials were held in the sea off Barbados. Thelab tests were performed in a glass tank filled with water containing particles andsediment to simulate the natural underwater environment in the open seas. Wetracked single and multi-colored objects and evaluated the performance of the threetrackers by manually comparing the number of mis-tracked frames against thosewhere successful tracking occurred. In the pool and sea trials, a yellow ball ofapproximately 15cm in diameter was used as a target. The robot followed it overa distance of 27 meters (See Fig. 7), limited by the length of the fiber-optic tether.Quantitative performance of the tracking algorithms were obtained from the labtrials, whereas the open ocean trials provided information about the robot’s behaviorunder the visual guidance mode.

The tag-based navigation system was tested in a pool environment to date. Weperformed both stand-alone tag command navigation as well as servo-and-navigateexperiments. In the first case, the diver was successfully and comfortably able toguide the robot around the pool using tags expressing 6 distinct behaviors and variousnumerical parameters. (see Fig. 10. The vision system detected tags with a 4 cmdiameter at a distance of 2 meters, although that number will diminish in open oceanif visibility is not as good as in a pool.

The usability of the system appeared to be excellent and, in particular, wasfar more convenient than prior methods based on visual communication with an


(a) Yaw rate, actual and desired, hovering mode.

(b) Pitch rate compared to desired rate, in hovering mode.

Fig. 13. Yaw and pitch accuracy.


observing human (who adjusted parameters via a tether). We are moving towarddefinitive usability studies.

The hovering gaits were evaluated in a pool as well as open water, both separatelyand in conjunction with visual servoing. This allowed us to assess its performanceand correct any major model failures. In testing hovering in isolation the 5 commandswere manually entered in real-time via a control interface connected by fiber-opticcable. The human operator would then qualitatively evaluate the response of therobot using the divers as visual landmarks (as well as fixed markers in a poolsetting). Quantitative data was then collected in the from of orientation and angularvelocities coming from an inertial measurement unit (calibrated in advance). Furthertests included deploying the robot in open sea, within a triangular formation ofdivers separated by roughly 5 m. Fig. 14 shows some snapshots taken during thisdeployment. Figures Fig. 13(a) and 13(b) show the result of testing single yawand pitch commands. For clarity, the input signals Cy and Cp have been scaled toroughly match the output angular rate of the vehicle. As can be seen from thosegraphs, the response of the robot is relatively linear, with the final rate close to thedemanded rate. The same figures show that our strategy was also effective in the caseof rapid reversal of command, satisfying one of the design criteria outlined earlier.The general handling of the robot proved to be good as well, although the yawingmotion entail greater delays than other motions. This phenomenon has alreadybeen explained in previous work [Giguere et al.2006], and is due to the absence ofpressure-drag forces in the direction of rotation when the flippers are executing thiscommand.

4.1 Conclusion

We have developed and validated an approach to vision-based control of an aquaticrobot, first with position control through visual-servoing, and then with a high-levelcommand execution through a visual tag-based communication language. We havealso validated a new and novel swimming gait for station-keeping, the hovering gait.This swimming in-place gait is a relatively novel ability which we have deployedusing judicious flipper positioning and control. This new gait complied with strictdesign criteria over good controllability in five degrees of freedom.

The effectiveness of our behaviors and subsystems were evaluated in variouscombinations both in a controlled pool setting as well as in open water (CaribbeanSea and coastal Nova Scotia, Canada). Results have been very successful, but ineach case, the acquisition of accurate quantitative performance data was a majorchallenge. Inertial measurement data has been useful, but due to its potential fordrift the results are not equivocal. Overall, purely visual guidance and control seemsmuch more effective than originally anticipated.

References

[Altendorfer2001] Altendorfer, R. 2001. Rhex: A biologically inspired hexapod runner.Autonomous Robots, 11:207–213.


(a) Pitch command: the fore and aft flipperspoint in opposite directions.

(b) Heave command: the fore and aft flipperspoint in same direction.

(c) Simultaneous heave and pitch commands:only the aft flippers are used, providing halfthe maximum heave and pitching rate at thesame time.

(d) Yaw command: two flippers on the visi-ble side of the vehicle and one flipper on theopposite side are used.

Fig. 14. Hovering gait snapshots while performing slow-speed maneuvers near a diver in opensea. Each picture represents some basic or combination of basic commands.

[Balasuriya and Ura2001] Balasuriya, A. P. and Ura, T. 2001. Underwater cable followingby Twin-Burger 2. In Proceedings of the IEEE International Conference on Robotics andAutomation, pages 920–925, Seoul, Korea.

[Benjamin et al.2006] Benjamin, M., Curcio, J., Leonard, J., and Newman, P. 2006. Naviga-tion of unmanned marine vehicles in accordance with the rules of the road. In InternationalConference on Robotics and Automation, ICRA 2006, Orlando, Florida.

[Bhattcharyya1943] Bhattcharyya, A. 1943. On a measure of divergence between two statis-tical populations defined by their probability distributions. Bulletin Calcutta Math Society,(35):99–110.

[Cha and Srihari2002] Cha, S.-H. and Srihari, S. N. 2002. On measuring the distance betweenhistograms. Pattern Recognition, 35(6):1355–1370.

[Comaniciu and Meer1999] Comaniciu, D. and Meer, P. 1999. Mean shift analysis andapplications. In In Proceedings of International Conference on Computer Vision, pages1197–1203.

[Comaniciu et al.2003] Comaniciu, D., Ramesh, V., and Meer, P. 2003. Kernel-based objecttracking. IEEE Transaction on Pattern Analysis and Machine Intelligence, 25(5):564–575.

[Dudek et al.2006] Dudek, G., Giguere, P., and Sattar, J. 2006. Sensor-based behavior controlfor an autonomous underwater vehicle. In Proceedings of the International Symposium onExperimental Robotics, ISER, Rio de Janeiro, Brasil.


[Dudek et al.2007] Dudek, G., Sattar, J., and Xu, A. 2007. A visual language for robot controland programming: A human-interface study. In In Proceedings of International Conferenceon Robotics and Automation ICRA, Rome, Italy.

[Dunbabin et al.2006a] Dunbabin, M., Vasilescu, I., Corke, P., and Rus, D. 2006a. Datamuling over underwater wireless sensor networks using an autonomous underwater vehicle.In International Conference on Robotics and Automation, ICRA 2006, Orlando, Florida.

[Dunbabin et al.2006b] Dunbabin, M., Vasilescu, I., Corke, P., and Rus, D. 2006b. Experi-ments with cooperative control of two autonomous underwater vehicles. In Proceedings ofthe International Symposium on Experimental Robotics, ISER, Rio de Janeiro, Brasil.

[Fiala2004] Fiala, M. 2004. ARTag Revision 1, a fiducial marker system using digitaltechniques. Technical Report NRC 47419, National Research Council, Canada.

[Fukunaga1990] Fukunaga, K. 1990. Introduction to Statistical Pattern Recognition. Aca-demic Press Incorporated, second edition.

[Giguere et al.2006] Giguere, P., Dudek, G., and Prahacs, C. 2006. Characterization andmodeling of rotational responses for an oscillating foil underwater robot. In IEEE/RSJ/GIInternational Conference on Intelligent Robots and Systems, Beijing, China.

[Kemp et al.2005] Kemp, M., Hobson, B., and J.H. Long, J. 2005. Madeleine: an agile AUVpropelled by flexible fins. In 14th International Symposium on Unmanned UntetheredSubmersible Technology (UUST), Lee, New Hampshire.

[Lots et al.2001] Lots, J.-F., Lane, D. M., Trucco, E., and Chaumette, F. 2001. A 2-D visualservoing for underwater vehicle station keeping. In Proceedings of the IEEE InternationalConference on Robotics and Automation, pages 2767–2772, Seoul, Korea. IEEE.

[Rives and Borrelly1997a] Rives, P. and Borrelly, J. J. 1997a. Underwater pipe inspectiontask using visual servoing techniques. In Proceedings of the IEEE/RSJ International Con-ference on Intelligent Robots and Systems, volume 1, pages 63–68, Grenoble, France.

[Rives and Borrelly1997b] Rives, P. and Borrelly, J. J. 1997b. Visual servoing techniquesapplied to an underwater vehicle. In Proceedings of the IEEE International Conference onRobotics and Automation, volume 3, pages 1851–1856, Albuquerque, NM, USA.

[Sattar and Dudek2006] Sattar, J. and Dudek, G. 2006. On the performance of color trackingalgorithms for underwater robots under varying lighting and visibility. In InternationalConference on Robotics and Automation, ICRA 2006, Orlando, Florida.

[Sattar et al.2005] Sattar, J., Giguere, P., Dudek, G., and Prahacs, C. 2005. A visual servoingsystem for an aquatic swimming robot. In IEEE/RSJ International Conference on IntelligentRobots and Systems, Edmonton, Alberta, Canada.

[Swain and Ballard1991] Swain, M. J. and Ballard, D. H. 1991. Color indexing. InternationalJournal of Computer Vision, 7(1):11–32.

Documents

Sensor-Based Behavior Control for an Autonomous Underwater ...people.clarkson.edu/~jsattar/allpapers/ijrr2009.pdf · Sensor-Based Behavior Control for an Autonomous Underwater Vehicle