A Risk Map for Gully Locations in Central Queensland- Australia

Embed Size (px)

DESCRIPTION

Research

Citation preview

  • European Journal of Soil Science, June 2011, 62, 431441 doi: 10.1111/j.1365-2389.2011.01375.x

    A risk map for gully locations in central Queensland,Australia

    A . H . Eu s t a c e , M . J . P r i n g l e & R . J . D enhamQueensland Department of Environment and Resource Management, Remote Sensing Centre, Ecosciences Precinct, 41 Boggo Road,Dutton Park, Queensland 4102, Australia

    Summary

    In central Queensland, Australia, relatively little is known about where gullies occur (gully presence).This is despite a general acceptance among scientists and politicians that gully erosion in the region is anecologically important process, exacerbated by grazing pressure. We aimed to create a risk map of gullypresence for a 4.86 106-ha area of central Queensland dominated by grazing and thought to be particularlyprone to gully erosion. We achieved this by using (i) light detection and ranging (lidar) technology (verticalaccuracy < 0.15 m; spatial resolution 0.5 m) to observe topography on transects at eight selected sites withinthe study area, (ii) object-oriented classification to derive gully presence from lidar observations and (iii) arandom forest to model the relationship between gully presence and a set of readily available explanatoryvariables (comprising soil, topography, and vegetation information; finest spatial resolution 25 m) and (iv)extrapolating the model to unsampled locations. Cross-validation indicated that the predictive ability of themodel was modest, with an average area under the receiver operating characteristic curve of 0.62 (where1.0 is a perfect model and 0.5 is no better than chance). The greatest risk of gully presence was associatedwith areas of large topographic variation, and where, coincidentally, there was relatively little long-termvegetation cover. Ultimately, however, we acknowledge that the quality of the map is limited by the smallarea of observed lidar data relative to the study area, the relatively coarse spatial resolution of the explanatoryvariables and the possibility that gully presence is the result of different processes at different locations.

    Introduction

    Soil erosion by water can have dramatic and far-reaching conse-quences. This is especially true in central Queensland, Australia,where east-flowing rivers carry sediment to the southern reachesof the Great Barrier Reef lagoon (Figure 1). The adverse effect ofsediment on the water quality of the World Heritage-listed marineecosystem stimulates environmental and political interest. A plau-sible hypothesis for the source of at least some of the sedimentis gully erosion (Prosser et al., 2001; Rustomji, 2006). We fol-low Hughes et al. (2001) in defining a gully as a steep-walled,poorly vegetated incision in the landscape with a catchment areaof 10 km2 or less.

    This study was motivated by the Australian and Queenslandgovernments political and environmental imperatives (Reef WaterQuality Protection Plan Secretariat (2009) and the DelbessieAgreement (DERM, 2007)) to improve water quality and landcondition in the catchments that drain into the Great Barrier Reef

    Correspondence: A. H. Eustace. E-mail: [email protected] 29 September 2009; revised version accepted 28 February 2011

    lagoon. These imperatives emphasize on-ground investment byland managers for the prevention or remediation of erosional fea-tures, including gullies. There is, therefore, a clear need for accu-rate, fine-scale mapping of where gullies occur in the landscapeto (i) help target sites for investment, (ii) assist post-investmentmonitoring and (iii) quantify the contribution of gullies to thesediment budget. Hughes et al. (2001) predicted that the NogoaRiver catchment in central Queensland had the largest gully den-sity (line-length of gullies per unit area) of anywhere in Australia.Unfortunately, their mapping was not at a spatial scale suitableto address these imperatives. They also acknowledged that theirmodel was most uncertain in central Queensland, because of alack of data. With these results in mind, we targeted the Nogoacatchment and the surrounding area for an investigation of wheregullies occur. We use the term gully presence to describe a bino-mial variable, determined on a fine grid over the surface of theland, coded as Gully at locations where incised areas occur, andas Non-gully elsewhere.

    To map gully presence is an ambitious undertaking, partic-ularly when the area of interest is large. In such a case, theconventional approach by expert interpretation of fine-resolution

    2011 The AuthorsJournal compilation 2011 British Society of Soil Science 431

  • 432 A. H. Eustace et al.

    EMERALD

    Dawson

    Nogoa

    Isaac

    Comet

    FitzroyMackenzie

    150E148E146E

    22S

    24S

    26S

    0N

    200100Kilometres Great Barrier Reef Lagoon

    Longitude

    Latit

    ude

    4

    8 2

    17

    3

    6

    5

    Figure 1 Six catchments comprising the Fitzroy Basin (Fitzroy, Isaac,MacKenzie, Dawson, Comet and Nogoa); the boundaries of each areshown as bold grey lines. The main drainage line in each catchmentis shown as a black line. The study area is the western portion of thebasin, bounded by the black box. The locations of the eight x-configuredsites inside the study area are labelled (NB not drawn to scale). Eachsite contains two lidar transects. Inset: the location of the Fitzroy Basinrelative to Queensland (the grey polygon) and to Australia.

    imagery followed by extensive field validation might not be anefficient use of resources. A more efficient approach, which wefollow here, would be to map gully presence according to its

    relation to a set of readily available environmental attributes.McBratney et al. (2003) describe the general framework by whichthis might proceed: (i) a set of explanatory variables (the environ-mental attributes) are assembled from, for example, a database ofhistorical spatial information, (ii) the variables are sampled to cor-respond to the locations of the observed response variable (gullypresence), and fed into an empirical statistical model and (iii) themodel is used to predict the value of the response variable atunsampled locations. The procedure might not create a map asaccurate as a conventional assessment of gully presence but hasthe advantage of generating a quantitative estimate of uncertainty(McBratney et al., 2003). For gully presence, the predictions of(iii) are arguably more useful if presented as the probability ofoccurrence as a risk map.

    For a statistical model of gully presence, spatial informationon topography and vegetation will be essential, in line with thedefinition of a gully proffered above. The susceptibility of soilto erosion will also be affected by its inherent physical, chem-ical and biological attributes (Lal et al., 1999), although Lentzet al. (1993) showed that such relationships are likely to be site-specific. There is a generalization that dispersivity of soil (mea-sured by the exchangeable sodium percentage and/or the sodiumadsorption ratio) has a strong influence on erosion (Rienks et al.,2000; Faulkner et al., 2003). Of the previous studies that haveused statistical modelling to characterize the spatial distributionof gully attributes (Table 1), all but one considered topographicinformation, while information related to soil, hydrology and veg-etation cover have only been used occasionally.

    Table 1 Studies that have used statistical modelling to characterize the spatial distribution of gully attributes

    Study Country (area of interest) Response variable Explanatory variablesa Model Accuracy

    Meyer &Martnez-Casasnovas(1999)

    Spain (two catchments,2500 ha each)

    Presence T, S, L, H Logistic regression Overall accuracy 85%.

    Hughes et al. (2001) Australia (continent) Density/mm2 T, S, L, C, G, V Piece-wise regression Correlation of predictedwith observed rangedfrom r = 0.43 to 0.83.

    Martnez-Casasnovaset al. (2004)

    Spain (60 ha) Presence ofsidewall erosion

    T, H, B Logistic regression Model accounted for 87%of the variation.

    Hyde et al. (2006) USA (three watersheds;8750 ha, 2600 and2448 ha)

    Rejuvenation T, H, B Logistic regression Overall accuracy 78%.

    Bou Kheir et al. (2007) Lebanon (67 600 ha) Distribution andsize

    T, S, H, G Tree-based models Best model explained 80%of variation in gully size.

    Vrieling et al. (2007) Brazil (5200 ha) Presence ASTER satelliteimagery

    Maximum likelihoodclassifier

    Best overall accuracy 75%.

    Vanwalleghem et al.(2008)

    Belgium (1329 ha) Presence T, S, P, heightabovesea-level

    Logistic regression Overall accuracy 77%.

    Gutierrez et al. (2009) Spain (54 farms, each ofat least 100 ha)

    Presence T, V, C Multivariate adaptiveregression splines(MARS)

    Areas under the ROCcurves between 0.75 and0.98.

    aMany individual variables were used, but they can be grouped as: B, basin metrics; C, climate; G, geology; H, hydrology; L, land-use; P, proximitymetrics; S, soil; T, topography; V, vegetation cover.

    2011 The AuthorsJournal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441

  • A risk map for gully locations 433

    A potential barrier to developing a statistical model of gullypresence concerns the response variable itself, because obtainingadequate spatial information on gully presence is a difficult task.One possibility, which we follow, is to gather fine-resolution topo-graphic information at various locations about the landscape. Lightdetection and ranging (lidar) technology (Petrie & Toth, 2009) isa source of detailed topographic information, and has been usedsuccessfully to quantify change in the landscape caused by erosion(Ritchie, 1995; Thoma et al., 2005). Lidar is costly, and so is gen-erally used for relatively small areas only; however, the data thatit provides (typically accurate to

  • 434 A. H. Eustace et al.

    spatial heterogeneity thresholds for the image-objects (Brennan &Webster, 2006). The heterogeneity of the pixels within an image-object is minimized according to a user-defined scale factor.Following a process of trial-and-error, we used a scale factor of 50,and shape and colour factors of 0.5 each to segment the images.

    From the properties of the image-objects, we created a set ofrules that allocated an image-object to the Gully class (Table 2a).We found that these rules tended to over-allocate the occurrence ofGully image-objects, so additional rules were added to reallocatemisclassified Gully image-objects to Non-gully (Table 2b). Weapplied a visual check to the polygons to ensure that water-holdingbodies (creeks and rivers) had not been allocated to Gully.This check was informed by the backscatter intensity, Quickbirdsatellite imagery (DigitalGlobe Incorporated, 2010) associatedwith each transect (0.6-m resolution, panchromatic-sharpened),and a Queensland-wide map of the drainage network (at 1:250 000scale). As a further check on the quality of the object-orientedclassification, we asked an independent expert to delineate thecentre-line of the gullies in each of the 16 transects, using onlythe Quickbird imagery as a guide.

    We produced a gully extent image by merging the Gully/Non-gully image-objects into a binary raster with 0.5-m pixelsfor each of the eight sites. An intermediate variable, gully depth,was obtained for each transect to remove potential spurious lidarmeasurements within the vertical error range of the lidar obser-vations. Gully depth was calculated by linear interpolation of theDEM values associated with Non-gully at the locations of theGully class, then subtracting the Non-gully DEM. Consideringthe difference in the spatial resolution of the lidar data and theexplanatory variables (see below; 0.5-m pixels versus, at best, 25-m pixels), we aggregated (by arithmetic averaging) gully depth tothe 25-m pixels of Landsat imagery. From the image of aggregatedgully depth (z) we derived an image of gully presence, coded the-matically as Gully at pixels where z > 0.15 m and Non-gully

    Table 2 The rules required to classify lidar image-objects as either gullyor non-gully categories

    Operation Rule

    (a)Allocate to gully Mean slope 15

    Mean standard deviation of DEM(3 3 window) 50 (m2)

    Mean standard deviation ofslope 6

    Length of longest edge ofpolygon 18 m

    (b)Reallocate gully to non-gully Mean standard deviation of

    slope 7Polygon rectangular fit 0.9

    (proportion between 0 and 1)Polygon length-to-width ratio 1.5

    and rectangular fit 0.9

    where z = 0 m. Pixels where 0 m < z 0.15 m were excludedfrom further analysis to avoid confusion with possible lidar inac-curacies. The final Gully observations were analysed further ata 25-m scale for each of the eight sites.

    Explanatory variables

    From the digital archives of the Queensland Government we gath-ered a set of 17 ancillary variables that might plausibly relateto gully presence (Table 3). This information covered the extentof the study area, and related aspects of soil, topography andvegetation.

    Explanatory variables originating in polygon formats were con-verted into 25-m pixel rasters. Differences in resolution wereresolved by using a near-neighbour algorithm to resample all theexplanatory variables to the same grid. Continuous soil attributeswere retrieved from an archive of nationwide soil information(CSIRO, 2006; Brough et al., 2006). The attributes were storedas interpolated surfaces in a polygon-based format, unfortunatelywithout estimation variances. This was not ideal, but we retainedthe information because attributes such as dispersivity and texture

    Table 3 Ancillary spatial information for the study area, used asexplanatory variables for modelling gully presence

    Variablea Label Units Comment

    SoilClay content Clay_a % A horizon

    Clay_b % B horizonCEC Cec_a cmol kg1 A horizon

    Cec_b cmol kg1 B horizonESP Xna_a % A horizon

    Xna_b % B horizonOrganic carbon oc_a % A horizon

    oc_b % B horizonSoil order Ord 10 classesbSalinity hazard Sal 3 classes (low,

    medium, high)Topographic

    DSM Dsm m See Tickle et al.(2009)

    Local slope slo_vr ()2 Variance(3 3 window) slo_mx Maximum

    slo_mn MinimumDrainage network Drn 2 classes (in, out)

    VegetationBare-ground index bgi_me Mean, 19882006

    bgi_sd Standard deviation,19882006

    aAcronym key: CEC, cation exchange capacity; DSM, digital surfacemodel; ESP, exchangeable sodium percentage.bThe orders (followed by the per cent coverage of the study area):Calcarosols (2%), Dermosols (5%), Ferrosols (4%), Kandosols (

  • A risk map for gully locations 435

    are known to affect the susceptibility of a site to erosion. We alsoretrieved a map of the soil order from the Australian Soil Classi-fication (Isbell, 1996), and a map of the perceived salinity hazardfor the study area.

    Topographic information was derived from a one-second(approximately 30-m spatial resolution) digital surface model(DSM), obtained from the Space Shuttle Radar Topography Mis-sion (Farr et al., 2007; Tickle et al., 2009). The DSM was cor-rected for striping artefacts that affected the original signal; how-ever, the coverage for Australia has not yet been corrected forvegetation height. From the DSM we computed slope, to whichwe applied a 3 3-pixel moving window that derived the localvariance, minima and maxima of the slope. The remaining topo-graphic information was a binary variable that determined whethera location fell within 25 m of the drainage network, derived froma map of the stream network of Queensland (1:250 000 scale).

    Vegetation information was based on a calibrated empiricalmodel, known as the Bare-Ground Index (BGI) (Scarth et al.,2006), applied to Landsat imagery. BGI is the proportion ofground not covered by vegetation (living or dead) when viewedvertically downwards from a standing position on the ground.The BGI model is applied to the Landsat imagery on a per-pixelbasis, but only at those pixels considered to have less than athreshold proportion of tree cover; we used

  • 436 A. H. Eustace et al.

    Model validation and cross-validation

    We excluded pixels where any of the explanatory variables wereassociated with null values. There were n pixels with which tobuild and validate a model of gully presence. The model vali-dation proceeded in two stages. The first stage was a standardvalidation, where the observations were partitioned randomly intoa training dataset (66%) and a validation dataset (33%). The train-ing sample was used to train model M0, which was then used topredict the probability of Gully at the locations associated withthe validation dataset. The second stage was a modified cross-validation. We indexed the n observations according to the site(out of eight) to which they belonged and then used all obser-vations in each site in turn to validate predictions from a modelformed from the remaining seven sites.

    It was possible to observe a class within a categorical variablethat was not included in the training of the random forest modelduring cross-validation because of the unique characteristics ofsome sites and the spatial distance between some sites. In theserare cases, we switched the untrained categorical class with oneof those used to train the model, selected at random. The soilvariable ord was the only categorical variable that required thisclass-switching method to ensure predictions of the model couldbe carried out at all validation locations, despite some ord valuesthat did not occur in the sampled training data appearing in thevalidation data. This was a conservative yet pragmatic approachwhen our limited sample area is considered.

    Validation and cross-validation served different purposes. Theformer allowed us to assess the performance of a random forestwhen the model was extrapolated at locations relatively close towhere the model was trained. Cross-validation, on the other hand,was needed to address concerns about the relatively small areacovered by the transects, and their limited geographical spread.As an example of how the two methods differed, the mean min-imum distance between the training sample and the validationsample was 25.7 m on the ground, that is, just more than onepixel; however, for cross-validation the mean minimum distancewas 4.5 104 m.

    The correspondence of observed with predicted probability ofgully presence was assessed with a receiver operating characteris-tic (ROC) curve (Zou et al., 2007). Prior to modelling, all n loca-tions had been identified as either Gully or Non-gully. On theother hand, the random forest returns a probability that a particularlocation belongs to Gully. At a particular probability threshold,all the predictions greater than the threshold were allocated toGully and vice versa; tabulation of the results reveals the pro-portion of correctly identified Gully locations (a true-positiveprediction), as well as the proportion of Non-gully locationswrongly allocated as Gully (a false-positive prediction). Theproportions change as the threshold changes. A ROC curve sum-marizes the trade-off between true-positive and false-positive pro-portions. Ideally, the proportion of true positives should be 1.0 andthat of false positives should be 0. Thus, when plotting true pos-itives (ordinate) against false positives (abscissa), the ROC curve

    for a good model will align closely with the top-left corner; aROC curve that lies on the 1:1 line implies that the model is nobetter than random chance. The area under the ROC curve, A, isa useful metric to quantify the predictive accuracy of a model ofa binomial variable, where a value of 1.0 indicates perfect agree-ment, and a value of 0.5 indicates no agreement. We computedROC curves with the ROCR library (Sing et al., 2005) written forthe R statistical software (R Development Core Team, 2009).

    Model extrapolation

    Following appraisal of the validation and cross-validation results,we used all the data to make a final model, Mf , to use forextrapolation across the grazing areas of the study site. If, duringthe process of extrapolation, a class of a categorical variable thatwas not included in Mf was found, we switched this class withone of those available in the model, selected at random.

    Results

    Delineation of gulliesGullies were visually apparent in the 0.6-m resolution Quickbirdimages associated with the lidar transects (Figure 2a,e). The rulesused to classify image-objects as Gully (Table 2a) tended toover-allocate the class shown as the red areas of Figure 2(b,f).The over-allocated areas included hills with variable slopes,some roads and infestations of Currant Bush (Carissa ovataR. Br.), a low-lying woody weed. The C. ovata heights wereincluded in the original lidar classification of Ground becausethe sprawling, dense structure of the shrub could not be penetratedby the lidar signal. These artefacts were removed by using therules in Table 2(b), which reduced greatly the gully-affected area(Figure 2c,g) such that it conformed to our perception of thefeatures in the Quickbird images. For comparative purposes, thecentre-lines of the gullies delineated by the independent expertare also shown (Figure 2d,h). There was good agreement betweenthe contrasting methods. By her own admission the experts line-work was conservative, because of obstruction by trees or, insome cases, cloud. The lidar-based method, which computes thearea of gullies, could be useful for future studies of how gullieschange through time. Overall, Figure 2 gave us confidence thatlidar, coupled to the rules devised for object-oriented classification(Table 2), adequately characterized gully presence.

    Modelling

    Following the removal of null values from the explanatory vari-ables, n = 21 312 pixels remained for modelling. The out-of-bagerror of the random forest M0 was 7.5% for both t = 500 andt = 100. This implied that in more than 90% of cases the modelsallocated a location correctly to Gully or Non-gully. However,this is a misleading result that reflects the fact that about 90%of the pixels were Non-gully, which the models could predictwith relatively good accuracy. The importance of each explanatory

    2011 The AuthorsJournal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441

  • A risk map for gully locations 437

    14711'40"E14711'20"E

    247

    '0"S

    247

    '20"

    S

    0 0.50.25Kilometres

    Latit

    ude

    Longitude

    (a)(b)

    (c)

    (d)

    Longitude

    Latit

    ude

    (f)

    (g)

    (h)

    (e)

    14724'40"E14724'20"E

    238'

    40"S

    239'

    0"S

    N

    Figure 2 Gully extents mapped using lidar and object-oriented classifica-tion: (a) a true-colour Quickbird image (0.6-m resolution, panchromatic-sharpened), for part of one lidar transect, (b) the extent of gullies accordingto the allocation rules in Table 2(a), (c) the extent of gullies according tothe reallocation rules in Table 2(b), and (d) the centre-line of gullies delin-eated by expert assessment of (a). Panels (eh) illustrate the same, but forpart of another lidar transect.

    variable in M0 (for t = 500) is shown in Table 4, ranked in orderfrom most important to the least important. The five most impor-tant variables in the model were the DSM, the BGI-related vari-ables and the maximum slope and variance of the slope. This wasconsistent with the general notion that topography and vegetativecover determine the propensity of soil to erode. The soil order wasthe next most important variable, which supported the notion thatsome soil types are intrinsically more erodible than others. Ratherthan exchangeable sodium percentage or texture as expected, themost important individual soil attributes were the organic carbonof the topsoil and the subsoil. The least important variable wasthe map of the drainage network. The fact that the individual soilattributes were relatively unimportant suggests one of two possi-bilities: either soil attributes do not influence gully formation inthe study area, or the soil information, as held by the database, wasnot suited to our particular task. We suspect the latter, because thesoil attributes are interpolated surfaces intended for use in modelsthat operate at scales much coarser than 25-m pixels.

    Table 4 The importance of each explanatory variable to gully presence,calculated by the random forest (t = 500)

    Rank Variable Importance

    1 dsm 0.0382 bgi_me 0.0293 slo_mx 0.0234 bgi_sd 0.0235 slo_vr 0.0156 ord 0.0127 slo_mn 0.0118 sal 0.0119 oc_b 0.007

    10 oc_a 0.00611 clay_a 0.00412 xna_a 0.00413 xna_b 0.00414 cec_a 0.00315 clay_b 0.00316 cec_b 0.00217 drn 0.002

    The ROC curves associated with the fitted values of the randomforest models and the validation and cross-validation predictionsare shown in Figure 3. The ROC curves for the fitted values ofrandom forest M0 showed that, at both t = 500 and t = 100, themodels predicted gully presence accurately (Figure 3a); the ROCcurves are relatively close to the top-left corner of the plot, and theareas under each ROC curve were identical at A = 0.81. Similarresults were seen for the validation data (Figure 3b). It was clearthat, for predictive purposes, a 100-tree forest would suffice. TheROC curve for the cross-validation predictions (t = 100) of theeight sites is shown in Figure 3(c). The predictive ability of therandom forest varied markedly across the study area, with thedata in sites 1, 3 and 7 being predicted less well than those inother sites. Site 1 was unique in that its gullies were widespreadrather than the localized incisions seen in other sites. This suggeststhat different processes determine gully presence at differentlocations. Sites 3 and 7 were associated mainly with minoritysoil orders rather than the dominant Sodosols and Vertosols(Table 3). During cross-validation, these minority classes weresystematically excluded from the random forests. In locationswhere soil orders in the validation data did not exist in the trainingdata, a soil order from the training data was randomly substitutedto enable predictions at these locations. This was carried out on thebasis that a less accurate prediction is better than no prediction. Asord was a relatively important variable (Table 4) the predictionsat these locations were effectively random. The average areaunder the ROC curve for cross-validation was A = 0.62, whichsuggested that the model had a relatively weak ability to predictgully presence accurately over a large area. We contend, however,that the modified cross-validation procedure is likely to haveunder-estimated predictive ability. As the sample for each stepin the modified cross-validation was based on a spatial regionrather than a random sample of all the data, it is possible that the

    2011 The AuthorsJournal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441

  • 438 A. H. Eustace et al.

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Mean false-positive rate

    Mea

    n tru

    e-po

    sitiv

    e ra

    te

    (c) Cross-validation

    (All t = 100; mean A = 0.62)

    Sites 1,3,7

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Mean false-positive rate

    Mea

    n tru

    e-po

    sitiv

    e ra

    te

    (a) Fitted values

    t A500 0.81100 0.81

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Mean false-positive rate

    Mea

    n tru

    e-po

    sitiv

    e ra

    te

    (b) Validation

    t A500 0.83100 0.80

    Figure 3 Receiver operating characteristic (ROC) curves for the randomforest models of gully presence: (a) fitted values of model M0, fort = {500,100}, (b) values predicted by M0 at the validation locations,for t = {500,100}, and (c) values predicted by models M1,...,8 at thecross-validation locations, for t = 100. The area under the ROC curveis denoted A.

    excluded data contained useful information about gully presencenot found elsewhere. This is particularly the case for sites 1, 3and 7. For t = 100, it is likely that the true area under the curveis somewhere between A = 0.62 and A = 0.80.

    The risk of gully presence predicted by Mf (t = 100) for thestudy area is shown in Figure 4. Only a minor proportion of thestudy area was affected by class-switching among the ord variable,certainly too small to make a visual impression on the map. In theinsets of Figure 4, we have magnified selected areas to highlighttheir interesting features. Figure 4(b) highlights an area of dra-matic topographic variation, known to have historically variablevegetative cover; the risk of gully presence in this area is relativelylarge. Figure 4(c) shows that the risk of a gully increases at thebase of remnant volcanic plugs (the circular features), while thesurrounding landscape has a relatively small risk of gully presence.The obvious discontinuities in the spatial pattern in Figure 4(d)are related to the boundaries of soil orders. These boundaries areparticularly uncertain, and diminish the quality of the risk map atthese locations.

    Discussion

    We have shown that lidar and object-oriented classification char-acterizes gully presence (Figure 2) in a useful way. However, itis not practical or reasonable to acquire lidar information for theentire study area because of current costs. A viable alternative,however, is based on the premise that gully presence is determinedby soil, topography and vegetation cover, which can be charac-terized through statistical modelling. We have not been able toconsider important history-related variables that can trigger gullydevelopment such as tree clearing or animal stocking rates becausesuch information was not readily available for the entire studyarea. Four studies (Meyer & Martnez-Casasnovas, 1999; Vrielinget al., 2007; Vanwalleghem et al., 2008; Gutierrez et al., 2009)have adopted a similar approach for modelling gully presence andreported results with varying accuracies. The only study that usedthe area under the ROC for accuracy assessment was Gutierrezet al. (2009). The performance of our model was worse than thatreported in their study. Our model of gully presence for centralQueensland had reasonable accuracy at locations near to trainingsites, but accuracy diminished as spatial distance from the trainingsites increased.

    There are three reasons for our modest result. First, the lidarinformation was concentrated in too few locations across thestudy site, with the x-configured transects effectively halvingthe amount of topographic information that might otherwise havebeen gained. Second, the soil-related explanatory variables werenot suited to a mapping exercise at a scale as fine as 25-m pixels.Third, gullies may be caused by different processes at differentlocations. Lentz et al. (1993), found that, even in small study areas(

  • A risk map for gully locations 439

    149E148E147E

    23S

    24S

    25S

    0

    C

    b

    d

    15 30 60 90 120Kilometres

    0 1 2 4 6 8Kilometres(c) (d)(b)

    (a)

    Longitude

    Latit

    ude

    N

    Figure 4 (a) Risk map of gully presence. White represents locations either outside the study area or masked (because of water, tree cover or a non-grazingland-use). (b) Relatively large probabilities are found where there is a large variation in terrain and variable vegetation cover. (c) Volcanic plugs haverelatively large probabilities around their bases. (d) Discontinuities in the surface signify a change in soil Order.

    McBratney et al. (2003) proposed a framework for digital soilmapping, which we have tried to follow. They also foresaw poten-tial problems, such as (i) missing, uninformative or circularly-derived explanatory variables, (ii) poor quality of soil informationin databases, (iii) black box data-mining techniques and (iv)over-fitting of the model. Each of these problems has, to somedegree, influenced our study: (i) and (ii) are the reality of digi-tal soil mapping, where there is an innate urge to make as muchuse of existing data as possible; we encouraged the possibility of

    (iii) and (iv) by electing to use a random forest. Breiman (2001b)argued that the predictive ability and the parsimony of a modelare mutually exclusive concepts: simple models are undoubtedlyeasier to interpret but are less accurate. In our case, we consid-ered robust prediction of gully presence to be more important thaninference about the underlying mechanism of the process. Randomforest is known to be a robust predictor (Breiman, 2001b; Prasadet al., 2006; Moriondo et al., 2008). Breiman (2001a) showed thata random forest does not overfit the information in the sense that

    2011 The AuthorsJournal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441

  • 440 A. H. Eustace et al.

    increasing infinitely the number of trees will not change the modelerror. This may be so, but we argue that the complexity of a ran-dom forest should always concern the user. We have shown that arelatively small forest can predict as well as a larger forest. Whenthere are many millions of predictions to make, as there werefor our study site, the smaller forest will complete the task moreefficiently.

    While the accuracy of the gully risk map can only be regardedas being modest, we anticipate that it will be used by policy-makers to identify areas for gully prevention and remediation.Furthermore, the risk map could conceivably be used by hydro-logical modellers interested in calculating sediment budgets forparticular subcatchments. Additional lidar acquisitions about theFitzroy Basin will enable us to update the model and increase thespatial extent of the risk map.

    Conclusions

    We have created a risk map of gully presence for our studyarea within central Queensland, Australia. This has been achievedby (i) using fine-resolution lidar to quantify local topography ateight sites in the study area, (ii) carrying out object-orientedclassification to derive gully extent from the lidar observations,(iii) developing a random forest to model the relationship betweengully presence and soil, topography and vegetation status and (iv)extrapolating the model across the study area at the scale of 25-mpixels. The predictive ability of the model was modest. The riskmap of gully presence showed that there is a large probability ofgully presence in areas of large variation in topography coincidentwith relatively low long-term vegetation cover. This agrees withour expectation of where gullies should occur. The quality of themap is constrained by the small area of lidar information collectedrelative to the study area, the relatively coarse spatial resolutionof the explanatory variables and the possibility that gully presenceis the result of different processes at different locations.

    The accuracy of the risk map of gully presence would improvewith further lidar acquisitions. A finer-resolution, nationwide,bare-earth digital elevation model and improved soil mapping overthe area of interest would also enhance the risk map.

    Acknowledgements

    This study was funded by the Fitzroy Basin Associationand the Queensland Department of Environment and ResourceManagement (DERM). We have greatly appreciated the supportof Christian Witte, Neil Flood, Ken Brook and Cameron Dougallas the study progressed. We thank Dan Tindall and TessaChamberlain for the comments on a draft version, and RebeccaTrevithick, DERMs expert gully-delineator.

    ReferencesArmston, J.D., Denham, R.J., Danaher, T.J., Scarth, P.F. & Moffiet, T.N.

    2009. Prediction and validation of foliage projective cover from

    Landsat-5 TM and Landsat-7 ETM+ imagery. Journal of AppliedRemote Sensing, 3, 335340.

    Baatz, M. & Schape, A. 2000. Multiresolution segmentation: an opti-mization approach for high quality multi-scale image segmentation. In:Andewandte Geographische Informationsverarbeitung, Volume XII (edsJ. Strobl, T. Blaschke & G. Griesebner), pp. 1223. Wichmann-Verlag,Heidelberg.

    Benz, U.C., Hofmann, P., Willhauck, G., Lingenfelder, I. & Heynen, M.2004. Multi-resolution, object-oriented fuzzy analysis of remote sensingdata for GIS-ready information. ISPRS Journal of Photogrammetry &Remote Sensing, 58, 239258.

    Bou Kheir, R., Wilson, J. & Deng, Y. 2007. Use of terrain variablesfor mapping gully erosion susceptibility in Lebanon. Earth SurfaceProcesses & Landforms, 32, 17701782.

    Breiman, L. 2001a. Random forests. Machine Learning, 45, 532.Breiman, L. 2001b. Statistical modelling: the two cultures. Statistical

    Science, 16, 199231.Breiman, L., Friedman, J.H., Olshen, R.A. & Stone, C.J. 1984. Classifi-

    cation and Regression Trees. Wadsworth, Belmont, CA.Brennan, R. & Webster, T.L. 2006. Object-oriented land cover classifica-

    tion of lidar-derived surfaces. Canadian Journal of Remote Sensing, 32,162172.

    Brough, D.M., Claridge, J. & Grundy, M.J. 2006. Soil and LandscapeAttributes: A Report on the Creation of a Soil and Landscape InformationSystem for Queensland. Natural Resources, Mines & Water, Brisbane.QNRM06186.

    CSIRO Australia. 2006. ASRIS (Australian Soil Resource Information Sys-tem) [WWW document]. URL http://www.asris.csiro.au/methods.html[accessed on 20 April 2010].

    Definiens 2006. Definiens Professional 5 Reference Book. Version 5.0.6.1.Definiens AG, Munchen, Germany.

    DERM (Queensland Department of Environment & Resource Manage-ment) 2007. Delbessie Agreement [WWW document]. URL http://www.derm.qld.gov.au/land/state/rural_leasehold/pdf/agreement.pdf [accessedon 6 April 2010].

    DERM (Queensland Department of Environment & Resource Manage-ment) 2008. Land Cover Change in Queensland 20072008 [WWWdocument]. URL http://www.derm.qld.gov.au/slats/pdf/slats_report_and_regions_0708/slats_report07_08.pdf [accessed on 6 April 2010].

    Daz-Uriate, R. & Alvarez de Andres, S. 2006. Gene selection and clas-sification of microarray data using random forest. BMC Bioinformatics,7, 3.

    DigitalGlobe Incorporated 2010. DigitalGlobe Constellation: QuickbirdImaging Satellite [WWW document]. URL http://www.digitalglobe.com/digitalglobe2/file.php/784/QuickBird-DS-QB.pdf [accessed on 16December 2010].

    Farr, T.G., Rosen, P.A., Caro, E., Crippen, R., Duren, R., Hensley, S.et al. 2007. The shuttle radar topography mission. Reviews of Geo-physics, 45, RG2004.

    Faulkner, H., Alexander, R. & Wilson, B.R. 2003. Changes to thedispersive characteristics of soils along an evolutionary slope sequencein the Vera badlands, southeast Spain: implications for site stabilisation.Catena, 50, 243254.

    Grimm, R., Behrens, T., Marker, M. & Elsenbeer, H. 2008. Soil organiccarbon concentrations and stocks on Barro Colorado Island digitalmapping using random forests analysis. Geoderma, 146, 102113.

    Gutierrez, A.G., Schnabel, S. & Felicsimo, A.M. 2009. Modelling theoccurrence of gullies in rangelands of southwest Spain. Earth SurfaceProcesses & Landforms, 34, 18941902.

    2011 The AuthorsJournal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441

  • A risk map for gully locations 441

    Hubble, G. & Isbell, R.F. 1983. Eastern highlands. In Soils: An AustralianViewpoint, (eds Lenaghan, J. & Katsntoni, G.), pp. 219230. CSIRO,Melbourne, Australia/Academic Press, London.

    Hughes, A.O., Prosser, I.P., Stevenson, J., Scott, A., Lu, H., Gallant, J.et al. 2001. Gully Erosion Mapping for the National Land and WaterResources Audit . Technical Report 26/01, CSIRO Land and Water, Can-berra [WWW document]. URL http://www.clw.csiro.au/publications/technical2001/tr26-01.pdf [accessed on 6 April 2010].

    Hyde, K., Woods, S.W. & Donahue, J. 2006. Predicting gully rejuvenationafter wildfire using remotely sensed burn severity data. Geomorphology,86, 496511.

    Isbell, R.F. 1996. The Australian Soil Classification. CSIRO Publishing,Melbourne.

    Kuhnert, P.M., KinseyHenderson, A., Bartley, R. & Herr, A. 2009.Incorporating uncertainty in gully erosion calculations using the randomforests modelling approach. Environmetrics, 21, 493509.

    Lal, R., Mokma, D. & Lowery, B. 1999. Relation between soil quality anderosion. In: Soil Quality and Soil Erosion (ed. R. Lal), pp. 237258.Soil and Water Conservation Society/CRC Press, Boca Raton, FL.

    Lentz, R.D., Dowdy, R.H. & Rust, R.H. 1993. Soil property patternsand topographic parameters associated with ephemeral gully erosion.Journal of Soil & Water Conservation, 48, 354360.

    Liaw, A. & Wiener, M. 2002. Classification and regression by randomForest. R News, 2, 1822 [WWW document]. URL http://cran.rproject.org/doc/Rnews/Rnews_20023.pdf [accessed on 7 April 2010].

    MartnezCasasnovas, J.A., Ramos, M.C. & Poesen, J. 2004. Assessmentof sidewall erosion in large gullies using multitemporal DEMs andlogistic regression analysis. Geomorphology, 58, 305321.

    McBratney, A.B., Mendonca Santos, M.L. & Minasny, B. 2003. On digitalsoil mapping. Geoderma, 117, 352.

    Meyer, A. & MartnezCasasnovas, J.A. 1999. Prediction of existing gullyerosion in vineyard parcels of NE Spain: a logistic modelling approach.Soil Tillage & Research, 50, 319331.

    Moriondo, M., Stefanini, F.M. & Bindi, M. 2008. Reproduction of olivetree habitat suitability for global change impact assessment. EcologicalModelling, 218, 95109.

    Petrie, G. & Toth, C.K. 2009. Airborne and spaceborne laser profilers andscanners. In: Topographic Laser Ranging and Scanning (eds J. Shan &C.K. Toth), pp. 2985. CRC Press, Boca Raton, FL.

    Prasad, A.M., Iverson, L.R. & Liaw, A. 2006. Newer classification andregression tree techniques: bagging and random forests for ecologicalprediction. Ecosystems, 9, 181199.

    Prosser, I.P., Rutherfurd, I.D., Olley, J.M., Young, W.J., Wallbrink, P.J.& Moran, C.J. 2001. Largescale patterns of erosion and sedimenttransport in river networks, with examples from Australia. Marine &Freshwater Research, 52, 8199.

    R Development Core Team 2009. R: A Language and Environment forStatistical Computing. R Foundation for Statistical Computing, Vienna[WWW document]. URL http://www.Rproject.org [accessed on 7April 2010]. (ISBN 3900051070).

    Reef Water Quality Protection Plan Secretariat 2009. Reef Water QualityProtection Plan [WWW document]. URL http://www.reefplan.qld.gov.au/library/pdf/reefplan2009.pdf [accessed on 6 April 2010].

    Rienks, S.M., Botha, G.A. & Hughes, J.C. 2000. Some physical andchemical properties of sediments exposed in a gully (donga) in northernKwaZuluNatal, South Africa and their relationship to the erodibilityof the colluvial layers. Catena, 39, 1131.

    Ritchie, J.C. 1995. Airborne laser altitude measurements of landscapetopography. Remote Sensing of Environment, 53, 9196.

    Rowland, T., van den Berg, D., Denham, R., ODonnell, T. & Witte, C.2006. Land Use Change Mapping from 1999 to 2004 for the FitzroyRiver Catchment. Queensland Department of Natural Resources &Water, Brisbane.

    Rustomji, P. 2006. Analysis of gully dimensions and sediment texturefrom southeast Australia for catchment sediment budgeting. Catena,67, 119127.

    Scarth, P., Byrne, M., Danaher, T., Henry, B., Hassett, R., Carter, J.et al. 2006. State of the paddock: monitoring condition and trendin groundcover across Queensland. In: Proceedings of the 13thAustralasian Remote Sensing and Photogrammetry Conference: Earthobservation From Science to Solutions. 2024 November 2006,Canberra.

    Sing, T., Sander, O., Beerenwinkel, N. & Lengauer, T. 2005. ROCR:visualizing classifier performance in R. Bioinformatics, 21, 39403941.

    Thoma, D.P., Gupta, S.C., Bauer, M.E. & Kirchoff, C.E. 2005. Airbornelaser scanning for riverbank erosion assessment. Remote Sensing ofEnvironment, 95, 493501.

    Tickle, P., Wilson, N., Inskeep, C., Gallant, J., Dowling, T. & Read, A.2009. Digital Surface Model (DSM) & Digital Elevation Model (DEM)(1 Second SRTM Derived): User Guide, Version 1.0. GeoscienceAustralia, Canberra.

    Vanwalleghem, T., Van Den Eeckhaut, M., Poesen, J., Govers, G. &Deckers, J. 2008. Spatial analysis of factors controlling the presenceof closed depressions and gullies under forest: application of rare eventlogistic regression. Geomorphology, 95, 504517.

    Vrieling, A., Rodrigues, S.C., Bartholomeus, H. & Sterk, G. 2007. Auto-matic identification of erosion gullies with ASTER imagery in theBrazilian Cerrados. International Journal of Remote Sensing, 28,27232738.

    Zou, K.H., OMalley, A.J. & Mauri, L. 2007. Receiveroperating charac-teristic analysis for evaluating diagnostic tests and predictive models.Circulation, 115, 654657.

    2011 The AuthorsJournal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441