Automated acoustic seabed classification of multibeam images of Stanton Banks

Applied Acoustics 70 (2009) 1277–1287

Contents lists available at ScienceDirect

Applied Acoustics

journal homepage: www.elsevier .com/locate /apacoust

Automated acoustic seabed classification of multibeam images of Stanton Banks

Jon Preston *

Quester Tangent Corporation, Unit 201, 9865 West Saanich Road, Sidney, BC, Canada V8L 5Y8

a r t i c l e i n f o

Article history:Available online 11 October 2008

Keywords:Seabed classificationAcoustic classificationSonar image compensationClustering

0003-682X/$ - see front matter � 2008 Elsevier Ltd. Adoi:10.1016/j.apacoust.2008.07.011

* Tel.: +1 250 654 3316; fax: +1 250 655 4696.E-mail address: [email protected]

a b s t r a c t

Dividing sidescan images into regions that have similar seabeds is often done by expert interpretation.Automated classification systems are becoming more widely used. This paper describes techniques,based on image amplitudes and texture, that lead to useful and practical automated segmentation ofmultibeam images. Seabed (or riverbed or lakebed) type affects amplitudes and texture, but so do sys-tem operating details and survey geometry. Effects of the last two must be compensated to isolate theeffects of seabed type. Images from multibeam surveys are accompanied by bathymetric data fromwhich grazing angles of all sonar footprints can be calculated. By compiling tables of amplitude againstrange and grazing angle, systematic changes in amplitude with these two variables can be removedconsistently. Classification, based on a large number of features, is done in image space to avoid arti-facts common in mosaics. Unsupervised segmentation requires clustering, in which records are dividedinto their natural classes. An objective clustering method using simulated annealing assigns points toclasses based on their Bayesian distances from cluster centres. Stanton Banks is a rocky area 100 kmnorth of County Donegal, Ireland, that rises about 100 m above the ocean floor at 180 m. Multibeamimages and data from an 80-km2 survey were classified into regions of acoustic similarity. Assigninglabels of physical properties to these regions requires non-acoustic ground truth, which was obtainedfrom a series of 105 photographs. Photographic geological assignments were found to correlate wellwith the acoustic classes.

� 2008 Elsevier Ltd. All rights reserved.

1. Introduction

Sonar images have been used as qualitative tools for seabedclassification for many years e.g. [1]. The trained geologist identi-fies regions that are acoustically similar and fit a mental libraryof image types. Image artifacts such as beam-pattern effects areeasily set aside by the human brain. Automated classification, thegoal of this work, requires a higher standard of image preparation.This effort is justified because class maps from automated pro-cesses are objective and can be made with little effort.

Statistical methods are used for segmentation because inver-sion of models of high-frequency backscatter has limited applica-bility. Given a full geoacoustic description of the bottom, as wellas a sonar calibration, the echo time series can be calculated[2,3]. The challenge in estimating geoacoustic parameters bymatching these envelopes to actual echoes, even with calibratedamplitudes, is uniqueness. Unique geoacoustic solutions can befound in some cases with specialised techniques [4]. For surveysover a wide range of rocks and sediments, including mixtures, aphenomenological approach, such as the one described here, isthe preferred practical choice [5]. The approach is a search for re-

ll rights reserved.

gions that are acoustically similar and differ, in some manner, fromother regions. The first step is careful compensation of the imagefor radiometric, geometric, and systematic effects. Features formany sub-images are then calculated, ignoring image regions oflow quality. The features could be used to segment the image intoregions, but it is convenient to reduce their dimensionality withprincipal components analysis before clustering. The sonar neednot be calibrated.

Phenomenological methods can divide a survey area into re-gions based on acoustic similarity, but cannot label the regionsgeologically or biologically. Associating grain-size or roughnessvalues, for example, with each region requires ground truth. Thejustification for acoustic seabed classification, even though non-acoustic ground truth is required, is the high area-coverage rateof acoustic surveys combined with the ability to apply the labelsderived from ground truth to entire regions, even areas remotein space and time from the ground-truth sites.

This paper starts with some background on beam-formedimages. It then presents the principles and practice of the stagesof automated image classification as used here: quality control, im-age compensation, feature generation, reducing dimensionality,clustering, and mapping. The resulting acoustic classes are com-pared with classifications of photographs, and the similaritiesdiscussed.

mailto:[email protected]

http://www.sciencedirect.com/science/journal/0003682X

http://www.elsevier.com/locate/apacoust

1278 J. Preston / Applied Acoustics 70 (2009) 1277–1287

2. Images from multibeam bathymetric sonars

The transmitter of a multibeam sonar insonifies most of a thinathwartship underwater plane. The pulse length is much less thanthe travel time, so there are two insonified areas that move alongthe seabed from directly under the ship outward to port and tostarboard. Beamforming is done on receive, so one can imaginean array of beams, also to port and to starboard, through whichthe insonified footprints pass. The echo in each beam is roughlytriangular in time, starting as the insonified area begins to enterthe beam footprint, rising to a maximum, and falling as the inson-fied area departs. These echoes are digitized and then bottompicked, as described in [6], to find the ranges from transducerto seabed which, when combined with beam angle, provide thebathymetric data that are often the primary goal of a multibeamsurvey.

Short time series from each beam around its pick, sometimescalled snippets, can be combined to make a beam-formed image[6]. To do so, start with an empty raster and locate the centre ofeach beam’s echo at the sample number corresponding to its range.Echo samples that occurred just before the pick go into adjacentpixels with smaller sample numbers, those from just after the pickinto pixels with larger sample numbers. Usually, the time seriesfrom neighbouring beams overlap. Averaging overlapping echoamplitudes can smooth image texture so is not appropriate if itis to be used for classification. Instead, priority in filling an overlappixel is best given to the sample closest to the pick of its parentbeam. This work was done in almost the same way: the prioritywas set by the length of the beam’s echo time series with the lon-gest snippets laid down first and the shortest last, with replace-ment. This stitching process gave an image raster for each ping.For a survey line, each ping is a raster in an image in which the hor-izontal axis is sample number and the vertical ping number.

In any multibeam survey some beams are invalid. This designa-tion can be assigned by the sonar system if the backscatter ampli-tude or signal-to-noise ratio in that beam is too low, or by thehydrographer if that beam’s depth is out of line with its neigh-bours. Neither the depth nor the image pixels from invalid beamsshould be used in classification. Setting them aside is a matter ofbookkeeping, which in this work was done with a mask and amap. The mask for a survey line has entries that are zero for a beamthat meets all the quality standards but has bits set for various rea-sons such as system-assigned invalidity or out-of-line depth. It is asingle-byte matrix with dimensions number of beams by numberof pings. A map is a matrix the same size as the image in whichthe elements are the number of the parent beam of each pixel. Pix-els in gaps between snippets have a map value that indicates noparent. With the two together, image pixels from flawed beamscan be identified and set aside from image classification.

Among the advantages of beam-formed images is the absence ofa surface return, because it arrives at the transducer outside anyreceive beam. Nevertheless, some manufacturers (notably olderReson systems) provide non-beamformed images generated byadding the signals from the receive elements without any timeor phase shifts. The Kongsberg images used in this work werebeam-formed.

3. Image compensation

Backscatter amplitudes in sonar images are affected by the sea-bed but also by the survey sonar and how it is used. Image com-pensation is the process of suppressing as much as possible ofthe instrument and survey effects, thus isolating the seabed effects.The more complete the compensation, the more accurate theclassification.

Geometrical survey effects are based on range and angles. Mul-tibeams have fairly uniform beam patterns and collect backscatterover a wide range of grazing angles. Sidescan sonars are somewhatthe reverse: non-uniform beam patterns and a small range of graz-ing angles. Because at least one angle is important, compensationwas done in this work in image space, that is, with the axes beingsample number and ping number, and not as a geographic mosaic.In image space, the range, angles, and system settings are knownfor every pixel. Not only does this facilitate accurate compensation,it allows the user to mask pixels that lie beyond the high-qualityregions by simply specifying angular or range borders. Masked im-age regions are set aside. Class maps derived from mosaics tend toexhibit artifacts from nadir regions and from borders between sur-vey lines, particularly if compensation was fully automated. Classi-fying in image space eases these issues.

Compensating for system settings such as transmit power andreceiver gain is just multiplicative. Image files logged by Kongsbergmultibeam systems already include these corrections [7]. Changingthe transmit pulse length can be partially compensated based ontransmitted energy, but accurate compensation is impossible be-cause the pulse length sets one dimension of the insonified area,which controls the statistics of the backscatter and hence the im-age texture. In this survey the pulse length was constant at0.20 ms, which Kongsberg calls Shallow Mode [7].

Artifacts due to survey angles and ranges are obvious in manysonar images. Sometimes the surveyor compensates for them dur-ing the survey by adjusting the coefficients of the time-variablegain (TVG) to make the image appear consistent across-track.The coefficients may be tweaked often as the sediments or depthschange. If the images are to be used for classification, it is impor-tant that pre-tweaking amplitude data be logged (as Kongsbergdoes), or that the gain coefficients be logged so their effects canbe undone.

The aim of image compensation for classification is to suppressangle and range effects on the image. With poor compensation,artifact classes with borders parallel to the ship track are oftenseen. Two approaches to compensation exist: one that requiressystem calibration and one that is strictly empirical. A multibeamsonar system can be calibrated in two ways: bathymetric and forbackscatter. For image-based classification, we are concerned onlywith the latter. A backscatter calibration involves measuring thegain of each beam [8]. When combined with vessel roll, the back-scatter of each pixel of an image can then be expressed in engineer-ing units. If further combined with a model of the dependence ofseabed backscatter on grazing angle, plus calculated grazing an-gles, the seabed backscatter can also be plotted in absolute units.In this way, image artifacts due to angles and ranges can be sup-pressed to a substantial extent [9]. However convenience is a majorargument in favour of an empirical method. Such methods fit wellwith acoustic seabed classification in that the sediment type isbeing sought, rather than having to be assumed in the backscattermodel. A common empirical approach is to fit the observed imageamplitudes in a survey line as a function of slant range, and thendivide each measured amplitude by the line-mean at that range[10]. This falls short in two ways, though. One is that if the sedi-ments in more than one survey line are to be classified, compensa-tion has to be consistent across all the survey lines. For example, ifsome lines cover only low-reflectivity muds and others cover onlyhighly reflective sand, the differences in mean amplitude betweenthe sediments could be removed if compensated line-by-line. Thesecond is that backscatter amplitudes can depend on up to threeindependent variables: range, beam angle, and grazing angle atthe seabed.

The empirical compensation method used in this work, and pat-ented in 2005 [11], was to compile a table of every observed imageamplitude as a function of two independent variables: travel time

J. Preston / Applied Acoustics 70 (2009) 1277–1287 1279

and grazing angle at the seabed. These data were compiled duringa first pass through all the data, before any survey line is classified.Specifically, a tangent plane is fitted to bathymetric data surround-ing each beam footprint (in geographic coordinates). The grazingangle for that beam is the complement of the angle between thebeam and the normal to that tangent plane. A compensation tableis three arrays of bins, organized by grazing angle and travel time.The bin widths were 1� and 1 ms. The first array stores the numberof pixels that were found to have the angle-time combination ofthat bin; the second holds the sum of the amplitudes of those pix-els, and the third holds the sum of the squares of the amplitudes.Once the table has been compiled, the images that contributed tothe table are compensated following Eq. (1). For each pixel ofamplitude a and known travel time and grazing angle, use themean, �a, and standard deviation, ra, from the corresponding binto find the normalized amplitude, a:

a ¼ a� �ara

ð1aÞ

These amplitudes are distributed in a small range around zero. Theamplitude in the 8-bit compensated image is then

^a ¼ 255a� amin

amax � aminð1bÞ

where amin and amax are typical values near the 1st and 99th percen-tile of the a distribution. The final image amplitudes are rounded tointegers after clipping negative values to zero and values greaterthan 255 to 255. Both amplitude and standard deviation are com-pensated – the latter is likely important in that the images are beingprepared for a statistical segmentation process.

Fig. 1 shows the means and standard deviations used to com-pensate the Stanton Banks images, derived from the three-part ta-ble described above. Our experience has been that this approach is

Fig. 1. Means and standard deviations of the populated time-angle bins of thecompensation tables used with the Stanton Banks survey. They were calculatedfrom the counts, sums, and sums of squares of each bin, which were stored in thethree-part compensation table.

quite successful at suppressing range-angle artifacts, sometimesthrough the full range of grazing angles and sometimes omittinggrazing angles between 90� and about 80�, which covers only asmall fraction of any multibeam image. Fig. 2 shows raw and com-pensated versions of part of a survey line of Stanton Banks. The dif-ference between these images is not dramatic because Kongsbergprepares the raw image with a sophisticated TVG that considersgrazing angle and includes a crossover angle between the near-na-dir and Lambert regions [7]. Differences are often more strikingwith images from other sonar systems.

An unusual issue arose in compensating this survey. A verylarge fraction of the pixels fell into one of two groups, either deepand low amplitude or shallow and high amplitude. Because therewere very few long-range high-amplitude pixels, the shallow anddeep regions essentially were compensated separately, giving amiddling compensated amplitude for both. In fact, there was abig difference in reflectivity between the bank and the surroundingsediments. This difference was partially restored, in the compen-sated images, by adding to the compensation tables eight surveylines that had been recorded by the same sonar system on the pre-vious day. The compensation tables are indexed by transmit-pulselength, meaning that of these eight lines only the portions recordedin Shallow Mode were used. Pixels from bright deep regions,deeper than 140 m, were added, raising the deep-water meansand restoring the difference. This issue is not common becausecompensation tables usually contain data from much more thanthe 8.5 h taken to survey Stanton Banks.

All sonar systems include a time-variable gain (TVG) that cor-rects roughly for range. Few systems, however, include any correc-tion for absorption, since this depends on water temperature andsalinity, which may not be known. This bin-based method correctsfor absorption empirically by basing its compensation on observedecho amplitudes from all ranges.

The Kongsberg Simrad EM1002, the sonar used in this survey,operates with its angular range divided into sectors with slightlydifferent frequencies [7]. The benefit is detecting weak reflectionsfrom near-horizontal angles even if they arrive simultaneouslywith multipath echoes from near nadir. This can complicate imagecompensation, particularly with a calibration- and model-basedapproach. We have not found it necessary to modify this bin-basedtechnique to accommodate sectors. If vessel roll were zero and thesector boundaries coincided with bin boundaries, compensationwould proceed on both sides of the sector boundary to give a con-tinuous image. In practice, vessel roll and other effects blur the sec-tor borders somewhat, but rarely do they affect classifications.

4. Features

The backscatter that appears in a compensated image is, ideally,determined only by the nature of the seabed sediments. The imagehas been compensated for the amount by which variables such astransmit power, receiver gains, range, and grazing angle differ fromsome baseline values. The next step in acoustic seabed classifica-tion is to capture backscatter characteristics into some statisticalvariables, which are called features [12]. Features are calculatedfrom portions of a sonar image. Mean, standard deviation, andother first-order statistics can be calculated for pixels from an im-age sub-sample of any shape. Rectangular sub-samples are partic-ularly well suited for seabed classification, because the containedportions of each raster can be analyzed for spectral character[13] and the rectangular array can be analyzed for texture [14].

The features used in this work capture image amplitude andtexture in both first- and second-order statistics. To prepare forfeature calculation, images of each survey line are divided intorectangular sub-images. For each sub-sample to represent a

Fig. 2. A portion of line 148 of the Stanton Banks survey. The shipboard image recorded from the Simrad EM 1002 (left) shows some range and angle artifacts that have beensuppressed in the image compensated as described in the text (centre). The multibeam image has been divided into port and starboard sides at the beam with the shortestrange for that ping; between them is a synthetic water column with the shallowest depth for this portion of the survey line subtracted. The right image shows the divisioninto rectangles for classification – one classified record will be generated for each rectangle, which are placed only where the image is not masked. The seabed slants down toport, so there are more samples on that side and therefore more distance and more rectangles.


roughly square patch of the bottom, there are more pixels across-track than along, because the across-track pixel spacing is set bythe speed of sound and the sample rate of the echo digitizer, whilethe along-track spacing depends on ping rate and ship speed. Inthis work, the rectangle size was 257 pixels across-track and 17along-track.

Rectangles were placed on the image as densely as possiblewithout overlap (Fig. 2), but only on high-quality parts of the im-age. Trial positions are assessed, and the program accepts themas actual rectangle positions if fewer than 5% of the enclosed pixelsare masked. One reason a pixel might be masked is its parent beamhaving a depth out of line from its neighbours. In this work, nomanual cleaning had been done, so flawed beams were identifiedby comparing each beam depth with an average of its neighbours,and flagging the beam as low quality if that difference exceeded athreshold. A display with all the beam depths of a line and athreshold slider facilitates choosing a threshold. The Stanton Banksdata are very high quality, so no gaps are evident in most arrays ofrectangles, such as in Fig. 2. Note that the furthest rectangles toport and starboard are within the maximum beam ranges mea-sured on that ping.

The matrix of amplitudes in each rectangular image sub-sam-ple is the raw material for a set of features, called a full featurevector (FFV). In this work, each vector has 132 elements, listed

Table 1Features calculated from image patches and used for segmentation

Algorithm Number offeatures

Colour in Fig. 3

Mean, standard deviation, skewness,kurtosis

4 Red

Quantiles 9 GreenPace features from power spectral ratios 15 BlueGLCM correlation 9 MagentaGLCM shade 9 CyanGLCM prominence 9 YellowGLCM contrast 9 RedGLCM energy 9 GreenGLCM entropy 9 BlueGLCM homogeneity 9 MagentaPower spectrum 32 CyanHistogram 8 YellowFractal dimension 1 Red

Seven algorithms generate 132 features from the backscatter in each rectangularpatch.

in Table 1. Some of these, particularly mean, standard deviation,histogram, and quantile, capture image amplitudes and their dis-tribution. These are first-order statistics in that they do not de-pend on the order or arrangement of the pixel amplitudes.Texture is a second-order statistic that captures transitionsamong amplitudes in a neighbourhood. Many of the 132 featuresare calculated from grey-level co-occurrence matrices (GLCM). AGLCM is a square matrix with the length of each side equal tothe dynamic range of the image. The (n,m) element of a GLCMis the number of times that the image amplitude changes fromn to m for a given direction and step size within the image. In thiswork step sizes of 1,2, and 3 pixels were used in each of threedirections: along-track, across-track, and 45� between them,resulting in nine GLCMs for each image segment. The GLCMsthemselves are not suitable as features; rather features such asprominence, as listed in Table 1, are calculated from each matrix[14]. These features capture the distribution of entries in a GLCM,which depends on image texture. The GLCM of a smooth imagehas large elements near its diagonal because nearby pixels inthe image are rarely very different – by contrast an image withrough texture can have many transitions between very differentpixels even with a small step size, giving its GLCM large off-diag-onal elements.

Up to 5% of the pixels within a rectangle can be masked as lowquality, and they require special treatment. The amplitude of amasked pixel was replaced by the median amplitude of its neigh-bours. If any of those eight neighbours is masked, it is not consid-ered in calculating the median. To ensure the median is defined, norectangle is placed to enclose a 3 � 3 array of pixels that are allmasked.

An FFV is made from the matrix of amplitudes in each imagerectangle of each survey line. FFV files from all the survey lines,treated separately to this point, are merged. The merged set of FFVsis a matrix with a row for each rectangle and a column for each fea-ture, namely 48875 � 132 for the Stanton Banks survey.

Stored with each FFV are its rectangle position, ping and samplenumbers, date, time, and the like. These are later used for filteringinappropriate data and for map-making. As an example of filtering,a plot of rectangle locations shows areas in which the ship wasturning and in which the seabed was over- or under-sampled.Over-sampled regions can be valuable for assessing class consis-tency and are usually retained. Under-sampled regions, or thosethat are unusual in some other way, can be set aside, although thatwas not done in this work.

Fig. 3. Contributions of features to Q values averaged over all records in theacoustic mud class (left) and rock classes (right). Each feature algorithm has acolour as given in Table 1; the colours of individual features increase in saturationwithin that set. First feature (mean) is at the bottom, last feature (fractal dimension)at the top. Some contributions are negative, but all are shown increasing upward toavoid overwriting; this means the total height of a bar does not equal the Q value.


5. Reducing dimensionality and clustering

So far, the classification process has been to compensate the so-nar images for geometrical effects, after setting aside beam foot-prints, beams, and regions of poor quality, to divide the imageinto sub-images, and to calculate features from these sub-images.To complete the process, we need only to group the merged FFVs,one from each sub-image, into groups that are acoustically similarand distinct from the other groups in some useful ways. This step iscalled clustering. Before clustering, it is convenient to reduce thedimensionality of the feature vectors.

Overall, the process used here was designed to classify a widerange of sonar images, from high-resolution short-range sidescanimages to deep-water multibeam images. A single pixel can repre-sent the backscatter from less than 0.1 m2 up to several hundredsquare meters of seabed. A very large number of features are calcu-lated to accommodate images with so much diversity. In any partic-ular case, many of these are highly correlated. To facilitate furtherprocessing and human comprehension of the acoustic diversity ofthe survey, it is convenient to reduce the dimensionality of featurespace from the 132 dimensions of the merged feature matrix to onlythree. Principal components analysis (PCA) gives orthogonal linearcombinations ordered by the amount of variance they represent.The first three principal components typically capture 90% to 95%of the variance because so many features are correlated. In thiswork, clustering was therefore done in the space of the first threeprincipal components, in which points can conveniently be plottedand visualized. While this choice is arbitrary, it is usual to limit thenumber of dimensions in some arbitrary way [15], and experiencehas shown that classification in three-dimensional space gives real-istic and useful class maps. The three axes of the reduced feature setare here called Q1, Q2, and Q3. Each rectangle is responsible for onepoint in Q-space, a point whose coordinates are the product of itsFFV and the first three columns of the PCA eigenvector matrix,which we call the reduction matrix. It is important that PCA be doneon the merged FFVs from the entire survey area.

The relative importance of the various features on the three Qcoordinates deserve comment. Fig. 3, with distinct but related col-ours for each feature, illustrates the feature contributions averagedover the mud and rock regions. The first feature, the mean ampli-tude in the rectangle, appears at the bottom of each bar and ismore important to Q1 and Q3 than to Q2, for which skewness (fea-ture 3) is significant. Table 2 lists the most important features ineach case. This should not be regarded as a general result – differ-ent features may well play important roles on different surveys.

Clustering starts with Q space containing many points, in thiscase 48875, that are to be assigned to some unknown number ofclusters such that the acoustic character of the points in a clusterare reasonably homogeneous and distinct from points in otherclusters in some way. Many clustering methods have been devel-oped over the years [16], but there is no consensus about optimalmethods. In general, they start by choosing cluster centres by somemethod, assigning each point to the cluster to whose centre it isclosest, and iterating positions and assignments until the assign-ments are consistent (the cluster centre is the mean position of

Table 2The most important features in the coordinates in reduced feature space (Q space)

Coordinate Acoustic class

Q1 Mud with burrowsRocks, boulders, or bedrock



the points assigned to that cluster) and some criteria have beenmet or minimized. An Euclidean metric is not well suited for thisassignment with acoustic records after PCA because the three axesrepresent distinctly different amounts of variance and thus shouldnot be treated equally. Clustering was done here with the Maha-lanobis distance, which normalizes a directed distance by the var-iance in that direction. Also added is a term that increases thenumber of points assigned to clusters that already have the largestpopulations. Populations in the previous iteration are used for thispurpose, serving as a prior expectation and thus attracting theterm Bayesian to the metric used here. These three distances be-tween a point at position x (a vector) and the centre of cluster iat mi can be written [15]

Euclidean : diðxÞ¼ ðx�miÞTðx�miÞ ð2aÞMahalanobis : diðxÞ¼ ðx�miÞTC�1

i ðx�miÞ ð2bÞBayesian : diðxÞ¼�2logðni=NÞþ log jCijþðx�miÞTC�1

i ðx�miÞ ð2cÞ

Feature or feature algorithm

Mean, standard deviation, GLCM entropy, GLCM shadeMean, standard deviation, GLCM entropySkewness, mean, quantile, GLCM correlationSkewness, Mean, GLCM correlation, quantileMean, GLCM entropy, skewness, quantileMean, GLCM entropy, skewness, standard deviation


where cluster i has ni points and covariance Ci, and there are a totalof N points (superscript T denotes transpose). The Bayesian metricwas used in this work. The clustering iterations were done in innerand outer loops in which the assignments, means, covariances, andpopulations were all adjusted.

In selecting a clustering process, an important criterion is objec-tivity. The class map should not depend on the user’s personalchoices. A clustering method that is objective, automated, and usesa Bayesian metric was developed a few years ago for acoustic sea-bed classification [5]. It iterates toward an optimal cluster solutionusing simulated annealing. Simulated annealing is a search tech-nique for the global minimum of a complicated cost function whileavoiding local minima. A suitable cost function for clustering mustbe minimum when it describes the best-fit set of clusters. Fig. 4shows that the Bayesian information content (BIC) of the clustershas this characteristic. It compares three attempts at clustering asynthetic data set, showing that the result that fits best has thelowest BIC. (This cannot be done with field data for which the trueset of clusters is not known.) The BIC also has a heritage in the the-ory of image compression [17]. It is the Bayesian distancessummed over all the points in all the clusters plus a term thatbiases the result away from a large number of clusters. This parsi-mony term is (10k � 1)/2log(N), where k is the number of clustersand 10k � 1 is the number of degrees of freedom, consisting of 3kcoordinates of means, 6k independent covariance matrix elements,and k � 1 fractions of the total number of points (N) in a cluster.

In simulated annealing, candidate solutions are perturbed. Ifthe perturbed model has a lower cost function than the originalit becomes the new candidate; if not it randomly may or maynot be discarded. Large increases are unlikely to be accepted latein the search because the random choice to keep or discard in-volves a comparison of the increase with a declining ‘‘tempera-ture”. The perturbations are implemented by randomlyselecting a ‘‘thief” cluster, so called because it steals points fromother clusters, favouring those that are populous and nearby [18].The full process is implemented in a program suite called the

Fig. 4. Ellipsoids depicting distributions in feature space of a synthetic data set with fiveThe true description (blue) is of points randomly normally distributed about five selectedcentres and covariances, with typical results shown in red. The lower the BIC the closer tinterpretation of the references to colour in this figure legend, the reader is referred to

Fig. 5. BIC evolution during a typical search for an optimal set of clusters (left); the searcfound from individual searches over a range of k shows the optimal number of clusters

automated clustering engine (ACE). An ACE run consists of multi-ple separate attempts to divide the points into k clusters, for auser-selected range of k. Several searches are made at each k,usually giving distinct cluster solutions because of the random-ness, which includes the selection of a starting candidate. Thelowest BIC value at each k is the optimal division for that numberof clusters, and the overall lowest BIC value objectively gives theoptimal set of clusters for the survey area. Fig. 5 shows, at left,the progress of an individual search for an optimal set of 13 clus-ters and at right the lowest BIC from separate searches between4 and 25 clusters, indicating that the optimal number of clustersfor this dataset is 15.

6. Stanton Banks acoustic classes

This multibeam survey of Stanton Banks, an 80-km2 survey, wasdone on 20 and 21 November 2005 with the Kongsberg SimradEM1002 multibeam sonar (Kongsberg Maritime AS, Kongsberg,Norway) on the RV Celtic Explorer. An EM1002 has 111 beamsathwartship, 2� beamwidth, and a nominal sonar frequency of95 kHz. All lines were surveyed in shallow mode with a transmitpulse length of 0.2 ms. It usually operates with an angular sectorexceeding 100�, as in this survey; when this is so an inner sectoroperates at 98 kHz with two outer sectors at 93 kHz [7]. The pho-tographic ground-truth survey was done on 19 June 2006, with acamera on a sled towed very near the seabed. Selected images fromthe seven tows were described in geological and biological terms.Further details on both surveys are in [19].

Previous sections of this paper present the classification pro-cess in detail, including some particulars of the application tothe Stanton Banks survey. Here we recapitulate those details,starting with beam quality. Of the total of 3865797 beam foot-prints in this survey (34827 pings � 111 beams per ping),26204 (0.68%) were deemed invalid by the sonar system. Theworst survey line had only 2.8% of its footprints invalid, whichis unusually pure. The other reason beams were masked was

clusters. The ellipsoid surface is one standard deviation from that cluster’s centre.cluster centres with selected covariances. A simulated annealing process seeks thesehe agreement, indicating that BIC can serve as a cost function to be minimized. (Forthe web version of this article.)

h is halted once the changes in BIC drop below a criterion. Comparing minimum BICfor the Stanton Banks dataset is fifteen (right).


having a depth out-of-line from their neighbours. 5267 (0.14%) ofthese were found with a search using a threshold of 1000. Thus99.18% of the beam footprints met the quality standards for clas-sification. These were further filtered by not placing rectangleswhere the grazing angle exceeded 70�, because the image com-pensation was not fully adequate in this narrow high-reflectivitynadir region. Rectangles 17 pixels along-track and 257 pixelsacross-track were distributed over the remainder of the image,48875 of them in total. Data for compensation, as stored in thecompensation table, can be from just a single survey if it coversmany sediments and depths, but the Stanton Banks survey didnot, so additional data from the same sonar and mode were in-cluded. Compensation gives image amplitudes in a distributionabout zero, which was then mapped to an eight-bit image (Eq.(1)). All 132 features listed in Table 1 were generated from thematrix of amplitudes in each rectangle. The number of dimen-sions of this feature space was reduced from 132 to three withPCA, in which the first principal component captured 59.8% ofthe total variance, the second 29.2%, and the third 5.1%, for a totalof 94.1%. Finally, the data points in the reduced feature space (Q

Fig. 6. Fifteen acoustic classes derived and coloured as described in the text, overlaid onbased on amplitudes in a rectangle placed on an image. They appear at the position of the(For interpretation of the references to colour in this figure legend, the reader is referre

space) were segmented with ACE, an objective clustering method.The ACE optimal result was 15 classes, with class assignments asplotted in Fig. 6.

The colours in Fig. 6 are called similarity colours because theyindicate how similar the classes are based on their positions in Qspace. The standard approach is to colour classes according to theirnumbers as entries into a list of primary colours. Cluster number-ing is arbitrary, though, and carries no information about clustersize or location. With similarity colours, all points in a cluster arecoloured depending on the position of the cluster centre. Its redsaturation depends on its Q1 coordinate, green Q2, and blue Q3.One is then reminded that points on a map with similar coloursare from clusters that are neighbours in feature space, and thusof similar acoustic character.

The geographic distribution of classified points in Fig. 6 is irreg-ular. Some regions were covered three times, while others have noclassified point within 100 m, particularly directly under the ship.Interpolation would facilitate using these class maps in a GIS appli-cation. Class number cannot be interpolated as if it were a contin-uous variable. Not only must the interpolated result be an integer,

sun-illuminated bathymetry. Contour interval is 20 m. Each coloured dot above iscentre of the parent rectangle, with colour indicating the acoustic class assignment.

d to the web version of this article.)

Fig. 7. The 15 acoustic classes interpolated categorically to fill the survey area, in the same similarity colours. The geological descriptions of those from which samples weretaken are shown colour-coded. Photograph locations are plotted as triangles (in the ellipses), coloured to indicate geological interpretation. The region within the yellowrectangle is shown enlarged in upper right. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)


intermediate class numbers are not correct because their number-ing sequence is arbitrary (for example interpolation of a collectionof points in classes 1 and 3 cannot yield class 2, since that classmay be quite different – it must yield either 1 or 3). Categoricalinterpolation is easily done by taking the mode, rather than themean, of the points being interpolated. A histogram approach iswell suited if weights are used. Few commercial GIS suites do cat-egorical interpolation. The interpolated maps in Figs. 7–9 are on a20-m grid and were made in QTC CLAMS, interpolating the closest20 field points within a 200-m search radius of each grid point withan exponential weighting function.

Ground truth for the Stanton Banks survey consisted of 112photographs that were described in terms such as ‘‘Nephrops bur-rows on mud” or ‘‘Sand and pebbles”. Setting aside seven with non-geologic descriptions left 105 that were grouped into the 6 classeslisted in Table 3 and plotted in Fig. 7. This ground truth was used toassign labels to the 15 acoustic classes by interpolating to identifythe acoustic classes at the site of each photograph, with identifica-tions shown in Table 4. Each entry is the number of sites that are inboth the corresponding photographic and acoustic classes, for

example 22 sites were in both photographic class 6 and acousticclass 2. By selecting the largest entry in each column, labels fromphotography could be associated with eight of the acoustic classes,as listed in Table 5. Of these 105 associations, 18 sites (17%) aremixed, in that their photographic class does not match the label as-signed to that acoustic class. For example, acoustic class 5 has beenmatched with photographic class 5, sand with pebbles or cobbles,but one site in acoustic class 5 had a photograph in the sand classand four in the rocks, boulders, or bedrock class. Seven acousticclasses could not be labelled because no photographs were takenin their areas.

The upper right corner of Fig. 7 shows classes in a transect ofincreasing grain size. Photographic classes are, from south to north,3, 4, and 5, thus transitioning from sand and mud, to sand, to final-ly sand with pebbles or cobbles. Acoustic classes show the sametrend. Class 1 (medium blue), then 13 (green-blue), then 3 (deepgreen), then 15 (russet) which, using labels as assigned above, issand with or without mud (1), to mud with shell fragments (13),to sand (3), to rocks boulders, and bedrock (15). The class bordersagree reasonably well, to better than 100 m.

Fig. 8. Confidence of correct assignment overlaid on interpolated acoustic classes. Confidence values exceeding 85% appear transparent. Class colours as in Fig. 7. (Forinterpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)


Associated with clustering are two statistics based on the posi-tion of each point in feature space. One is the confidence that apoint has been assigned correctly, which is based on comparisonsamong the Bayesian distance to the centre of its assigned clusterand the distances to all other cluster centres. These are shown inFig. 8. One statistic is not adequate to express the relationship be-tween a point and its assigned cluster. To illustrate this, considerthe right-most of a set of clusters in one dimension. The confidencethat a point even further right should be assigned to this cluster isvery high, no matter how far right the point is. The second statisticcompletes the description; it is the probability that a point will befound at the number of standard deviations of the cluster distribu-tion (assumed normal) at which the point lies. It is based on onlythe point and its assigned cluster, unlike confidence, which in-volves all the cluster centres. Fig. 9 shows the probabilities. Onerole of confidence and probability is in selecting sites for gatheringground truth. Any point in the transparent regions of both Figs. 8and 9 would be suitable (except those close to inter-class borders,to ease the need for positioning accuracy). The photographs usedas ground truth in this work were taken before these maps weremade.

The classification process used in this work is contained in thesoftware suite QTC MULTIVIEWTM (Quester Tangent Corp., Sidney,BC, Canada, www.questertangent.com). In fact, these results (savefor interpolating with QTC CLAMS) were generated from QTCMULTIVIEW alone. Loading the Kongsberg .all files and manualquality control, including selecting the depth threshold for clean-ing, took an hour. The next stages, including generating features,took 5 hours of unattended batch processing on a 1-GHz PC. Eachimage is compensated in the background during these stages.Automatic clustering with ACE took another unattended over-night run.

7. Discussion

Images and supporting data from a multibeam survey of Stan-ton Banks were classified with a phenomenological approach giv-ing practical classes. Before segmenting the survey area intoregions, or classes, that are acoustically similar, low-quality datawere set aside with masks whose entries denote bad beams andmaps that relate image pixels to their parent beams. Images were

http://www.questertangent.com

Fig. 9. Probability, a measure of how near class records are to the centre of their cluster, overlaid on interpolated acoustic classes. Probability values exceeding 20% appeartransparent. Class colours as in Fig. 7. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Table 3Descriptions of photographic ground-truth classes

Photo Class Description Colour

1 Mud with burrows Cyan2 Mud with shell fragments Blue3 Sand and mud Yellow4 Sand Orange5 Sand with pebbles or cobbles Red6 Rocks, boulders, or bedrock Black

Table 4Number of ground-truth sites in each photographic and acoustic class

Photo class 1 2 3 4 5 6 7123 34 4 3 15 66 22 4

Photographic classes (6) were assigned from descriptions. An acoustic class was assigneclasses. Empty cells indicate that no site occurred in that combination of classes.


compensated and classified in image space, that is, no image mo-saic is made, which ensures that the heritage and geometry ofevery pixel are known. Compensation involves finding the grazingangle of each beam footprint by fitting tangent planes to depths inthe vicinity of that footprint. Features were generated from rectan-gular patches of the compensated images. The records were clus-tered in a feature space whose axes were the first three principalcomponents of the features, with an objective clustering engine.

Acoustic class

8 9 10 11 12 13 14 1515 1

61

1 426 1 7

d to each ground-truth location by categorical interpolation among the 15 acoustic

Table 5Descriptions of acoustic classes, derived from Tables 3 and 4

Acoustic Class Description

1 Sand with or without mud2, 10, 15 Rocks, boulders, or bedrock3 Sand5 Sand with pebbles or cobbles11 Mud with burrows13 Mud with shell fragments

Acoustic class 1 has a combined description.


The resulting acoustic classes were labelled by comparison withphotographic ground truth, with only 17% of the ground-truth sitesnot in the best-match acoustic class.

The only important choice the users makes in this classificationprocess is the size of the rectangular patches. This choice deter-mines the resolution of the class map. If the spatial scale of seabedfeatures in the survey area can be estimated, it is appropriate tochoose a rectangle size somewhat smaller. An associated choiceis the grid spacing for categorical interpolation. To benefit fromthe interpolation, this should be several times the length of a sideof a rectangle.

An ongoing issue in acoustic seabed classification is how wellparticular features discriminate. One expression of this is the sizeof their contributions to reduced feature vectors. Some heavilyweighted features in this work were mean, standard deviation,skewness, and two of the eight GLCM features: entropy and corre-lation. Their importance in this process does not guarantee theirvalue for classifying other data sets. Discrimination by GLCM fea-tures has been studied extensively. With RADARSAT images, Ar-zandeh and Wang [20] concluded that three features were betterthan two and that two-pixel steps were optimal, and recom-mended averaging over directions. Blondel reports good classeswith only GLCM entropy and homogeneity [21].

Before comparisons with ground truth, the quality of the acous-tic classification of an area can be assessed based on consistency inoverlap regions and the absence of artifact class borders parallel tothe ship track. The latter can indicate inadequate compensation forradiometric and geometric effects. These artifacts can be sup-pressed, if necessary, by adding data from more surveys to thecompensation table so that it includes data from more sedimentsat more depths, and by restricting the region to be classified basedon grazing angle or range. Adequate compensation throughout thenadir region is not always possible, but pixels from these areas caneasily be set aside since the survey geometry of every pixel isknown. Classifications extend only out to the maximum range thatgave bathymetry of acceptable quality. The range could be reducedif system noise were evident.

Acoustic classes 2, 10, and 15 were all given the same label:rocks, boulders, or bedrock. It is not unusual to find the same sed-iment types associated with several acoustic classes. Surfaceroughness can be very important to the character and amplitudeof high-frequency backscatter. The roughness of rocks, boulders,

and bedrock can be quite different, yet the three can be largelyequivalent in terms of habitat.

The phenomenological method used in this work does not re-quire any sonar calibration. It is important that the sonar operatesconsistently and without changes to transmit pulse length, but itcan be uncalibrated.

The process described here was one of unsupervised classifica-tion, in that the acoustic classification was done with no knowl-edge of the sediments in the survey area. Ground-truth datawere used after the classification to apply labels to the acousticclasses. Having done so, it is possible to use the PCA reduction ma-trix and the class centres and covariances in Q space to classify newimage data without repeating PCA or clustering. The reduction ma-trix and class descriptors are stored in a catalogue file for thispurpose.

References

[1] Reed TB, Hussong D. Digital image processing techniques for enhancement andclassification of SeaMARC II side scan sonar imagery. J Geophys Res1989;94:7469–90.

[2] Pouliquen E, Bergem O, Pace NG. Time-evolution modeling of seafloor scatter. I.Concept. J Acoust Soc Am 1999;105:3136–41.

[3] Sternlicht DD, de Moustier CP. Time-dependent seafloor acoustic backscatter(10–100 kHz). J Acoust Soc Am 2003;114:2709–25.

[4] Sternlicht DD, de Moustier CP. Remote sensing of sediment characteristics byoptimized echo-envelope matching. J Acoust Soc Am 2003;114:2727–43.

[5] Preston JM, Christney AC, Beran LS, Collins WT. Statistical seabed segmentation– from images and echoes to objective clustering. In: Proceedings of theseventh European conference on underwater acoustics, Delft, NL; 2004. p.813–8.

[6] Lurton X. An introduction to underwater acoustics: principles andapplications. Chichester UK: Praxis Publishing; 2002. Sec 8–4.

[7] Kongsberg Simrad AS. EM1002 multibeam echo sounder operator manual.Bergen, Norway; 2000.

[8] Foote KG, Chu D, Hammar TR, Baldwin KC, Mayer LA, Hufnagle LC, et al.Protocols for calibrating multibeam sonar. J Acoust Soc Am 2005;117:2013–27.

[9] Hellequin L, Boucher JM, Lurton X. Processing of high-frequency multibeamecho sounder data for seafloor characterization. IEEE J Oc Eng 2003;28:78–89.

[10] Capus C, Ruiz IT, Petillot Y. Compensating for changing beam pattern andresidual TVG effects with sonar altitude variation for sidescan mosaicing andclassification. In: Proceedings of the seventh European conference onunderwater acoustics, Delft, NL; 2004. p. 827–32.

[11] Preston JM, Christney AC. Compensation of sonar image data primarily forseabed classification. US Patent 6868041, UK Patent 2403013; 2005.

[12] Kil DH, Shin FB. Pattern Recognition and prediction with applications to signalcharacterization. Woodbury, NY: American Institute of Physics; 1996.

[13] Pace NG, Gao H. Swathe seabed classification. IEEE J Oc Eng 1988;13:83–90.[14] Haralick RM, Shanmugam K, Dinstein I. Textural features for image

classification. IEEE Trans Syst Man Cybernet 1973;SMC-3:610–21.[15] Preston JM, Kirlin RL. Comment on acoustic seabed classification: improved

statistical method. Can J Fish Aquat Sci 2003;60:1299–300.[16] Legendre P, Legendre L. Numerical ecology. 2nd English

ed. Amsterdam: Elsevier; 1998 [chapter 8].[17] Fraley C, Raftery AE. How many clusters? Which clustering method? Answers

via model-based cluster analysis. Comput J 1998;41:578–88.[18] Bandyopadhyay S, Maulik U, Pakhira MK. Clustering using simulated annealing

with probabilistic redistribution. Int J Patt Recog Art Intell 2001;15:269.[19] Brown CJ, Blondel Ph. Developments in the application of multibeam sonar

backscatter data for seafloor habitat mapping. Appl Acoust [this volume].[20] Arzandeh S, Wang J. Texture evaluation of RADARSAT imagery for wetland

mapping. Can J Remote Sensing 2002;28:653–66.[21] Blondel Ph, Sempéré JC, Robigou V. Textural analysis and structure-tracking for

geological mapping: applications to sonar images from Endeavour Segment,Juan de Fuca ridge. In: Proceedings of the IEEE Oceans’93, Victoria, Canada, III;1993. p. 208–13.

Documents

Automated acoustic seabed classification of multibeam images of Stanton Banks