10
Articial neural network for acid sulfate soil mapping: Application to the Sirppujoki River catchment area, south-western Finland Amélie Beucher a, , Peter Österholm a , Annu Martinkauppi b , Peter Edén b , Sören Fröjdö a a Åbo Akademi University, Geology and Mineralogy, Domkyrkotorget 1, 20500 Åbo, Finland b Geological Survey of Finland, PO Box 97, 67101 Kokkola, Finland abstract article info Article history: Received 11 June 2012 Accepted 4 November 2012 Available online 10 November 2012 Keywords: Acid sulfate soils Articial neural network Radial basis functional link net Probability map In Finland, acid sulfate (AS) soils constitute a major environmental issue. These soils leach considerable amounts of metals into watercourses, causing severe ecological damage. As small hot spot areas affect large coastal waters, mapping constitutes an essential step in the management of AS soil environmental risks (i.e. to target strategic places where to put mitigation). The primordial aim of this study was to evaluate the predictive classication abilities of an Articial Neural Network (ANN) for AS soil mapping. The Sirppujoki River catchment (460 km 2 ) located in south-western Finland was selected as study area. An ANN called Radial Basis Functional Link Nets (RBFLN) was applied in order to create probability maps for AS soil occurrences in the study area. This method required the use of aerogeophysical, quaternary geology and elevation data, as well as known AS soil and non-AS soil sites. Applying the RBFLN method, we generated different probability maps. For the most accurate probability map, the combined very high and high proba- bility areas covered 23% of the study area and contained 94% of the validation points corresponding to AS soil occurrences. The combined low and very low probability areas occupied the remaining 77% of the study area and contained all the validation points corresponding to non-AS soil sites. These results being con- sistent with previous studies and veried by expert assessment, the RBFLN method demonstrated reliable and robust predictive classication abilities for AS soil mapping in the study area. This spatial modelling tech- nique allows the creation of valid and comparable maps, and represents a powerful development within the AS soil mapping process, making it faster and more efcient. Consequently, we recommend the RBFLN modelling, nalized by an expert assessment, for AS soil mapping. © 2012 Elsevier B.V. All rights reserved. 1. Introduction In Finland and Sweden, acid sulfate (AS) soils mainly originate from ne-grained sulde-bearing sediments, which have been deposited under anoxic conditions at the bottom of the Baltic Sea during Holocene, starting during the Litorina-Sea period. After the recession of the Fennoscandian ice sheet, the sediments have been raised up to 100 m above current sea level by the isostatic land uplift, which can reach up to 8 mm/yr today (Fig. 1; Donner, 1995). Then, because of drainage, sul- des in the upper 12 m are oxidised and sulfuric acid is released. Under drastically lowered pH, metals contained in the soil are mobilised and transported into recipient watercourses, causing severe ecological damage. Finland has the largest AS soils deposits in Europe (more than 1000 km 2 ; Edén et al., 2012a; Yli-Halla et al., 2012) and they bring notably more metals to the watercourses than the whole com- bined Finnish industry (Sundström et al., 2002). The toxic combination of metals and acidity coming from these soils is believed to affect more than one third of the Finnish coastal waters (data in Roos and Åström, 2005), notably small hotspot areas having an impact on large water areas. Mapping of AS soil areas is therefore one of the most important steps needed in order to target strategic places for mitigation. After large-scale sh kills in 2006 and the implementation of the European Union Water Frame Directive, a cooperation network initiated by the Geological Survey of Finland (GTK) has been formed in order to create a nationwide AS soil map and to mitigate the problems due to these soils. The mapping program started in 2010 within the framework of the Climate Change Adaptation Tools for Environmental Risk Mitigation of Acid Sulfate Soils (CATERMASS) project, which is subsidized by Life+(EU's nancial instrument for the environment). In Australia, AS soil mapping has been carried out using different methods: traditional soil sampling, air photo interpretation, as well as numerical modeling techniques. For instance, numerical classication approaches have been used to identify AS soils. Bierwirth and Brodie (2005) describe the specic use of airborne gamma-radiometric data, as well as different data sets (Digital Elevation Model, satellite ASTER data and geological mapping) within a stepwise classication model to create a GIS-based map of surface soil acidity. In order to generate a unied atlas of Australian AS soils, Fitzpatrick et al. (2008) developed a methodology involving the collation and assembling of all existing Journal of Geochemical Exploration 125 (2013) 4655 Corresponding author. Tel.: +358 408682534. E-mail address: amelie.beucher@abo.(A. Beucher). 0375-6742/$ see front matter © 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.gexplo.2012.11.002 Contents lists available at SciVerse ScienceDirect Journal of Geochemical Exploration journal homepage: www.elsevier.com/locate/jgeoexp

Artificial neural network for acid sulfate soil mapping: Application to the Sirppujoki River catchment area, south-western Finland

  • Upload
    soeren

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Journal of Geochemical Exploration 125 (2013) 46–55

Contents lists available at SciVerse ScienceDirect

Journal of Geochemical Exploration

j ourna l homepage: www.e lsev ie r .com/ locate / jgeoexp

Artificial neural network for acid sulfate soil mapping: Application to the SirppujokiRiver catchment area, south-western Finland

Amélie Beucher a,⁎, Peter Österholm a, Annu Martinkauppi b, Peter Edén b, Sören Fröjdö a

a Åbo Akademi University, Geology and Mineralogy, Domkyrkotorget 1, 20500 Åbo, Finlandb Geological Survey of Finland, PO Box 97, 67101 Kokkola, Finland

⁎ Corresponding author. Tel.: +358 408682534.E-mail address: [email protected] (A. Beucher)

0375-6742/$ – see front matter © 2012 Elsevier B.V. Allhttp://dx.doi.org/10.1016/j.gexplo.2012.11.002

a b s t r a c t

a r t i c l e i n f o

Article history:Received 11 June 2012Accepted 4 November 2012Available online 10 November 2012

Keywords:Acid sulfate soilsArtificial neural networkRadial basis functional link netProbability map

In Finland, acid sulfate (AS) soils constitute a major environmental issue. These soils leach considerableamounts of metals into watercourses, causing severe ecological damage. As small hot spot areas affectlarge coastal waters, mapping constitutes an essential step in the management of AS soil environmentalrisks (i.e. to target strategic places where to put mitigation). The primordial aim of this study was to evaluatethe predictive classification abilities of an Artificial Neural Network (ANN) for AS soil mapping. TheSirppujoki River catchment (460 km2) located in south-western Finland was selected as study area. AnANN called Radial Basis Functional Link Nets (RBFLN) was applied in order to create probability maps forAS soil occurrences in the study area. This method required the use of aerogeophysical, quaternary geologyand elevation data, as well as known AS soil and non-AS soil sites. Applying the RBFLN method, we generateddifferent probability maps. For the most accurate probability map, the combined very high and high proba-bility areas covered 23% of the study area and contained 94% of the validation points corresponding to ASsoil occurrences. The combined low and very low probability areas occupied the remaining 77% of thestudy area and contained all the validation points corresponding to non-AS soil sites. These results being con-sistent with previous studies and verified by expert assessment, the RBFLN method demonstrated reliableand robust predictive classification abilities for AS soil mapping in the study area. This spatial modelling tech-nique allows the creation of valid and comparable maps, and represents a powerful development within theAS soil mapping process, making it faster and more efficient. Consequently, we recommend the RBFLNmodelling, finalized by an expert assessment, for AS soil mapping.

© 2012 Elsevier B.V. All rights reserved.

1. Introduction

In Finland and Sweden, acid sulfate (AS) soils mainly originate fromfine-grained sulfide-bearing sediments, which have been depositedunder anoxic conditions at the bottomof theBaltic Sea duringHolocene,starting during the Litorina-Sea period. After the recession of theFennoscandian ice sheet, the sediments have been raised up to 100 mabove current sea level by the isostatic land uplift, which can reach upto 8 mm/yr today (Fig. 1; Donner, 1995). Then, because of drainage, sul-fides in the upper 1–2 m are oxidised and sulfuric acid is released.Under drastically lowered pH,metals contained in the soil aremobilisedand transported into recipient watercourses, causing severe ecologicaldamage. Finland has the largest AS soils deposits in Europe (morethan 1000 km2; Edén et al., 2012a; Yli-Halla et al., 2012) and theybring notably more metals to the watercourses than the whole com-bined Finnish industry (Sundström et al., 2002). The toxic combinationof metals and acidity coming from these soils is believed to affect morethan one third of the Finnish coastal waters (data in Roos and Åström,

.

rights reserved.

2005), notably small hotspot areas having an impact on large waterareas. Mapping of AS soil areas is therefore one of the most importantsteps needed in order to target strategic places for mitigation. Afterlarge-scale fish kills in 2006 and the implementation of the EuropeanUnion Water Frame Directive, a cooperation network initiated by theGeological Survey of Finland (GTK) has been formed in order to createa nationwide AS soil map and to mitigate the problems due to thesesoils. The mapping program started in 2010 within the framework ofthe Climate Change Adaptation Tools for Environmental RiskMitigationof Acid Sulfate Soils (CATERMASS) project, which is subsidized byLife+(EU's financial instrument for the environment).

In Australia, AS soil mapping has been carried out using differentmethods: traditional soil sampling, air photo interpretation, as well asnumerical modeling techniques. For instance, numerical classificationapproaches have been used to identify AS soils. Bierwirth and Brodie(2005) describe the specific use of airborne gamma-radiometric data,as well as different data sets (Digital Elevation Model, satellite ASTERdata and geological mapping) within a stepwise classification modelto create a GIS-based map of surface soil acidity. In order to generate aunified atlas of Australian AS soils, Fitzpatrick et al. (2008) developeda methodology involving the collation and assembling of all existing

Fig. 1. Location of the Sirppujoki River catchment area and extent of the former Litorina Sea (8000 BP).

47A. Beucher et al. / Journal of Geochemical Exploration 125 (2013) 46–55

AS soil data and of various data sets (elevation, land system, marinehabitat, tidal, estuarine, bathymetric, vegetation and remotely senseddata) within GIS. In Finland, AS soil mapping was only conducted byconventional soil sampling and subsequent soil-pH measurementsuntil now (Palko, 1994). For our study, traditional methods were con-sidered too laborious and time-consuming for mapping the potentialarea along the entire Finnish coast (Fig. 1). Spatial modelling tech-niques, on the other hand, may be applied on existing geostatisticsand provide a large set of tools in applied geosciences. They may over-come some limitations encountered within conventional mapping,allowing the prediction of soil properties in areas with little or no infor-mation and indicating the uncertainty of the predictions. Among the va-riety of spatialmodelling techniques, Artificial Neural Networks (ANNs)

are empirical, data-driven methods which attempt to emulate featuresof biological neural networks (i.e. the humanbrain andnervous system)in order to address a range of difficult information processing, analysisand modeling problems (De Smith et al., 2009). They constitute goodpattern recognition and classification tools with the ability to generalizefrom imprecise input data (Porwal et al., 2003). Being non-linear, ANNsallow the predictive modelling of complex natural situations involvingintricate chemical and physical processeswhich are not directly observ-able. They also deal with uncertainty related to input data. In this study,an ANN method called Radial Basis Functional Link Nets or RBFLN(Looney, 1997, 2002) was used to create probability maps for AS soilsoccurrences. To our knowledge, RBFLNs, and ANNs in general, werenever utilized for this specific purpose. Nevertheless, RBFLN has been

Fig. 2. Sulfur andfield pH in typical AS soil profiles in Finland corresponding to: (a) ploughlayer; (b) oxidized layer (pHb4.5); (c) transition layer (4.5≤pH≤6.0); and (d) reducedlayer (pH>6.0).

48 A. Beucher et al. / Journal of Geochemical Exploration 125 (2013) 46–55

used for mineral-potential mapping (Behnia, 2007; Nykänen, 2008;Porwal et al., 2003), groundwater mapping (Corsini et al., 2009), land-slide hazardmapping (Ermini et al., 2004) aswell as geologicalmapping(Barnett andWilliams, 2009). As Bonham-Carter (1994) described it, anempirical or supervised method requires known examples of the fea-tures being modeled for training (in our case, AS soil occurrences). TheRBFLN method also utilizes examples of areas not containing the targetfeatures for training (i.e. non-AS soil sites), aswell as available evidentialdata layers (in our case, Quaternary geology, multi-element low altitudeairborne geophysics and slope derived from a Digital Elevation Model).

The primary aim of this work is to evaluate the predictive classifi-cation abilities of the RBFLN method for mapping AS soils. Based onknown occurrences of AS soils, ease of access and sufficiency of datacoverage, the 460 km2 Sirppujoki River catchment located in south-western Finland was selected as study area.

2. Study area

The Sirppujoki River catchment is located in south-western Finland(Fig. 1). Upstream the Proterozoic bedrock consists of rapakivi graniteswhile composite gneisses, quartz- and granodiorites occur downstream.The bedrock is frequently exposed, but ismainly covered by aQuaternarybedrock drift complex, with clay-filled depressions, and somemire areasin the north-east (Perttunen et al., 1984). An esker complexmainly com-prises pebble-rich gravel and sand, and trends northwest-southeastthrough the area. In areal extent, till is the most dominant soil type(40.1%), followed by bedrock outcrops (28.5%). Fine-grained sedimentsare mainly Litorina clays (15.8%) which contain relatively much organicmatter (Palko et al., 1985). Peat (6.6%), gyttja (6.3%), sand (2.6%) andsilt (less than 0.1%) also appear. The Sirppujoki drainage area was previ-ously studied and AS soil occurrences were recorded, in particularthe Valkojärvi farmland area located in the middle of the catchment(Nyberg et al., 2011; Palko et al., 1985; Triipponen, 1997; Yli-Halla,1997). Cultivated land often corresponds to clay soils and covers about30% of the study area (National Board of Waters, 1977) and the remain-der is mostly forest and swamp. Besides, the area comprises only fewsmall lakes and ponds. In terms of acidity, the Sirppujoki River catchmentis one of the most problematic in Finland as the Sirppujoki River flowsinto the Uusikaupunki city fresh water basin, which has a major valuefor recreational use andfishing industries (Triipponen, 1997). The impactof AS soils is very critical on the poorly renewed basin waters, as well asin the lakes and ponds within the catchment.

3. Material

3.1. Soil profiles

Soil profiles were used as training points within the neural net-work modeling. Most of the profiles were collected from conventionalsoil sampling carried out during the summer of 2010. The remainderwere extracted from a previous study of the area (Triipponen, 1997).Using a portable auger, we sampled 49 profiles at vertical depth inter-vals of 20 cm down to 3 m depth. The sampling sites were chosen sothat they were representative of the study area. Conventionally, an ASsoil profile can be divided into four horizontal units according to thesoil characteristics and the pH variations: (a) a plough layer mostlycontaining organic matter; (b) an oxidized acidic layer with pHlower than 4.5 which may extend to depths below 1.5 m; (c) asemi-oxic transition layer where pH rapidly increases to 6.0; and(d) a reduced layer with pH higher than 6.0 (Fig. 2). As such soilshave been typically drained for several decades in Finland, a largeportion of the sulfides have been oxidized into sulfates and leachedfrom the oxidized layers, while the anoxic parent sediments generallycontain sulfides (including black monosulfides) with total sulfur con-tents typically between 0.2% and 1.0%.

The field pH was measured for each sample within 24 h from sam-pling. The average depth of oxidation was 1.6 m. The lowest field pHmeasured in the oxidized layerwas 3.3. Every samplewas also analyzedfor sulfur with ICP-OES after digestion in aqua regia. The total sulfurcontent ranged between 0.2 and 1.7% in the reduced layers. Blackmonosulfides were notably very rare in the area. For samples with afield pH higher than 4.0, pHwas measured again after 8 weeks of incu-bation at room temperature. After incubation, pH decreased most oftenbelow 3.0 in sulfidic transition and reduced layers, the minimum incu-bation pH recorded being 2.2. The soil profileswere considered as actualAS soils, when they comprised an oxidised acidic layer (pH lower than4.5) with an underlying sulfidic horizon within 3 m depth, and poten-tial AS soils, when the pH decreased by 0.5 units to a value lower orequal to 4.0 after incubation (Edén et al., 2012b; Soil Survey Staff,1999). Out of the 49 soil profiles carried out in 2010, 36were consideredas AS soils: 24 actual and 12 potential AS soils. In addition, we used13 actual AS soil sites which were previously studied by Triipponen(1997).

3.2. Raster datasets

A 1:20000 scale digital Quaternary geology map was provided byGTK. AS soils are generally located in very low-relief areas, typically inlow-lying areas like plains, swamps and river valleys, unlike till forma-tions and bedrock outcrops. In order to distinguish between these landcover types, a simple slopemodel was thus created from aDigital Eleva-tion Model (DEM; National Land Survey of Finland, MML) with an orig-inal cell size of 25×25 m and elevation data accuracy 2 m.

High resolution low altitude airborne geophysics (flight altitude30–40 m and line spacing 200 m) provided by GTK comprises low andhigh electromagnetic frequencies (3 kHz and 14 kHz) with real andimaginary components, which yield four components for interpretation

49A. Beucher et al. / Journal of Geochemical Exploration 125 (2013) 46–55

of the ground electrical conductivity. The real component allows de-tecting anomalies originating deep in the bedrock, for instance fromblack schists, while the imaginary component especially indicates shal-low anomalies mostly related to variations in top-soil thickness and/orelectric conductivity. Black schists are often associated with sulfidedeposits and might cause high metal contents in soil or ground water(Airo and Loukola-Ruskeeniemi, 2004). Black schists are, however, notknown to occur in the study area, neither does the aerogeophysicsindicate their existence in areas covered by top-soil (i.e. no thin andelongated, high electric conductivity or magnetic anomalies weredetected). Sulfide-bearing sediments yield strong electromagneticanomalies due to their high contents of soluble salts (Suppala et al.,2005; Vanhala et al., 2004); the anomalies related to these soils aremore diffuse and round. Since they do not appear in magnetic data,only electromagnetic 3 kHz real and imaginary components were usedfor the RBFLN modelling.

4. Artificial neural networks

Artificial neural networks (ANNs) are mimicking biological nervoussystems like the human brain. They can be defined as simplified math-ematical models which are trained to learn new associations, new func-tional dependencies and new patterns (Tsoukalas and Uhrig, 1997).Biological neurons are fundamental elements which are connected toeach other. As nerve cells, they receive multiple signals through theirsynapses, combine and modify them, and then transmit the result toother neurons. In anANN, the artificial neurons are identical and usuallyorganized in layers which are connected. Various neural networkmodels were defined along the ANN development. In this study, weuse a neural network model called Radial Basis Functional Link Nets orRBFLN (Looney, 1997, 2002). This specific model was chosen as itrequires known examples of the features being modeled for trainingas well as examples of areas not containing these. In fact, defininghigh probability areas is as important as defining low probability areasin our case.

An RBFLN is a feed-forward network composed of three layers(Fig. 3): (1) an input layer of N nodes, where each node receives oneinput signal (or variable) corresponding to a feature vector element;

Fig. 3. The general architecture of a radial basis

(2) a hidden layer of M artificial neurons, each neuron representing aradial basis function (RBF); (3) an output layer of J artificial neurons(Looney, 1997; Looney and Yu, 2001). A feature vector x is composedof N elements obtained by the combination of the N evidential datalayers. It is fed to the input layer and transmitted to the hidden layer.There, each neuron receives and sums up the inputs x (x1, x2,…, xN).The sum is then fed into a Gaussian activation function (i.e. RBF, anon-linear filter), which results into a single output y. The Gaussianfunction is one of the most common activation functions; it outputs anumber between 0 (for low input values) and 1 (for high input values).The outputs y (y1, y2,…, yM) are then transmitted to the output layerneurons, each value of y being multiplied by synaptic weights umj.Moreover, the output layer neurons are directly connected to theinput nodes in an RBFLN, thus they also receive the inputs x multipliedby a second set of synaptic weights wnj. Finally, each output layerneuron returns a unique output z.

Basically, the hidden neurons connect the input feature vectors to anoutputmapmade of target vectors t through a self-organizing structure.Each target vector has a probability value which is calculated non-linearly by progressive iterations between neurons. Iterations are con-trolled by the synaptic weights affecting the interneuron connections.The weights, umj and wnj, are repeatedly modified and the outputsz (z1, z2,…, yJ) approach the targets t (t1, t2,…, tJ) until each feature vec-tor is mapped correctly to its corresponding target vector. An RBFLN in-cludes both a non-linear model (through the Gaussian activationfunction) and a linear model (the direct connections between inputand output). According to Looney (2002), the advantage of RBFLN isthat a smaller number of hidden neurons can be used.

5. Data pre-processing and integration

Data pre-processing and integrationwas performed using ArcGIS 10(ESRI software) and an extension formultivariate analysis called SpatialData Modeller (SDM) (Sawatzky et al., 2009). The neural networkmodeling also required the use of a module called GeoXplorer (version5.1)which is included in the SDMpackage. The 460 km2 study areawasmodeled by 50 m×50 m cells totaling to roughly 184000 cells.

functional link net (RBFLN; Looney, 2002).

50 A. Beucher et al. / Journal of Geochemical Exploration 125 (2013) 46–55

5.1. Pre-processing

5.1.1. Training pointsThe RBFLN method requires two different sets of training points:

positive training points representing AS soil occurrences, and negativetraining points corresponding to non-AS soil sites. As positive trainingpoints, we used the 36 AS soil profiles identified in 2010 to which weadded the 13AS soil sites fromTriipponen (1997). The negative trainingpoints (non-AS soil sites) were constituted from the 13 remainingnon-AS soil sites and 36 sites extracted randomly from the lowest prob-ability areas of maps previously generated using another techniquecalled weights-of-evidence (WofE) analysis (Bonham-Carter, 1994).This modus operandi is commonly used as the RBFLN performs betterwith equal number of positive and negative training points (Nykänen,2008; Porwal et al., 2003). From both the 49 positive and negativesites, 15 points (30%) were randomly selected to be used as validationsites and the remaining 34 were used for training (Fig. 4).

5.1.2. Raster datasetsThe four evidential data layers were first pre-processed for the

modelling. The same grid cell size (50×50 m)was used for all the rasterdatasets. Their cell values were reclassified using various methods(Table 1). These methods were selected using weights statistics whichwere calculated for every class of the evidential data layers (includingthe area, the number of training points, the weights W+ and W−, aswell as the contrast C). The weight W+ indicates the correlationbetween training points and a possible pattern within the data layer,while W− corresponds to the correlation between training points andareas excluding the pattern. The contrast C corresponds to the dif-ference between the weights W+ and W− (C=W+−W−) andgives a measure of the association between the training points andthe evidential data layer (Bonham-Carter, 1994). The classificationmethod giving the best degree of correlation is thus indicated by thehighest contrast value.

Fig. 4. Training and validation points

5.1.2.1. Quaternary geology. Originally, the Quaternary geology mapcomprised 15 different categories that were manually reclassifiedinto 7 classes defined by expert knowledge (Table 2 and Fig. 5),class 1 corresponding to fine-grained sediments which are the mostlikely to constitute AS soils and class 7 representing soils which arethe least likely.

5.1.2.2. Slope derived from Digital Elevation Model. In the slope model,angles ranged between 0 and 7°. They were manually reclassified into4 classes (Fig. 5): class 1 corresponding to interval [0–0.9], class 2 to[1–1.9], class 3 to [2–3.9] and class 4 to [4–7]. Class 1 covers 64% ofthe whole catchment area and corresponds to the flat areas where ASsoils mainly occur.

5.1.2.3. Aerogeophysics. The 3 kHz real component was reclassifiedinto 15 classes using the natural breaks method (Fig. 5), where class1 corresponds to the lowest conductivity and class 15 to the highest.This method creates classes on the basis of clusters and holes in thedata. Class limits are placed where relatively large gaps occur in thedata. The low frequency imaginary component was also reclassifiedinto 15 classes with quantile method (Fig. 5). This method createsclasses so that each class contains an equal number of items.

5.2. Integration

5.2.1. Feature vectorsInput feature vectors for the RBFLN modeling were created by

combining the four input data layers (reclassified rasters representingquaternary geology, 3 kHz real and imaginary components, andslope) into a unique conditions grid. Within the study area, each50 m×50 m cell of the grid is thus represented by a feature vector xdefined in four-dimensional space: x (x1, x2, x3, x4). The generatedgrid comprised 1547 unique conditions (i.e. 1547 different input fea-ture vectors). This grid was associated to the positive and negative

in the Sirppujoki catchment area.

Table 1Evidential data layers used in the neural network modeling of AS soils.

Dataset Evidence Original gridcell size (m)

Classificationmethod

Numberof classes

Aeroelectromagnetic geophysicsImaginarycomponent

Highconductivityshallow areas

50×50 Quantile 15

Realcomponent

Highconductivitydeep areas

50×50 Naturalbreaks

15

Slope from DigitalElevation Model

Low relief areas 50×50 Manual(expert)

4

Quaternarygeology

Fine-grainedsediment areas

100×100 Manual(expert)

7

51A. Beucher et al. / Journal of Geochemical Exploration 125 (2013) 46–55

training datasets (68 points in all). The feature vectors correspondingto a positive or a negative training point are called training featurevectors. The RBFLN method assumes that a feature vector can corre-spond to only one training point (Nykänen, 2008): whenever twopoints (one positive and one negative) have a corresponding uniqueoverlay condition (i.e. they do not have the same location, just corre-spond to the same unique condition), the positive point is picked.Therefore, the final number of training feature vectors can differfrom the number of training sites. In our case, 50 training featurevectors were created out of 68 training points.

5.2.2. RBFLN training and predictive classificationDuring the training phase, the neural network is basically learning to

classify input feature vectors using the training feature vectors as exam-ples. During the classification phase, the network is set to classify all theinput feature vectors into class values (i.e. the RBFLN probabilityvalues), ideally ranging between 0 and 1. The closer the value is to 1,the higher the probability that it belongs to the positive category(i.e. AS soil occurrences) and conversely, the closer it is to 0, the higherthe probability that it belongs to the negative category (i.e. the non-ASsoil sites).

The training process requires defining the number of hidden neu-rons (RBF) and the number of iterations. One common way to selectthe optimal amount of RBF is to determine which value is associatedto the lowest sum of squared errors (SSE), after several trainingruns have been conducted with different amount of hidden neurons(e.g. from10 to 70). For the successive training runs (aswell as their fol-lowing classification phases), different parameters were recorded: SSE,mean squared error (MSE) and range of probability values (i.e. theminimum and maximum class values). With the lowest SSE, the rangeof probability values should also get close to the optimal range [0–1].

Table 2Soil types reclassification for the digital Quaternary geology map.

Original class Soil type Modeling class

1 Bedrock 72 Gravelled till 63 Till 64 Fine-material till 65 Soil filling 56 Sand 57 Water 48 Very fine sand 49 Peat production area 310 Sphagnum peat 311 Sedge peat 312 Fine silt 213 Very fine silt 214 Clay 115 Gyttja 1

Correspondingly, the optimal number of iterations are sought by run-ning a set of runs with an increasing number of iterations for each run(e.g. from 40 to 320), until the SSE stabilizes. Nevertheless, the SSEshould not reach zero, the number of iteration would then be toohigh, resulting in over-training of the neural network. Over-training(or over learning) means that the network can classify almost perfectlythe training feature vectors but cannot generalize and equally well clas-sify the unknown feature vectors (Porwal et al., 2003).

In this study, the lowest SSE values were reached with 50 RBF(Table 3). The SSE did, however, not stabilize after 1000 iterationsand the networks using the optimal number of RBF and any highnumber of iteration resulted in maximal probability values far above1 (up to 2.4). Another approach had thus to be used. The neural net-works resulting in maximal probability values larger than 1.2 wereexcluded. Table 4 shows the training and classification parametersof the 6 remaining RBFLN which used 40 to 60 RBF and 80 to 120iterations.

5.2.3. Probability maps and validationOnce the RBFLN training and classification were carried out, the

probability values of thenewly classified feature vectorswere associatedback to the unique conditions grid. In order to get a common scale forthe 6 different models, the RBFLN probability values were rescaledbetween 0 and 1. They were also reclassified with equal intervals intofour classes ([0–0.2.5], [0.25–0.5], [0.5–0.75] and [0.75–1]) in order tocreate probability maps for AS soil occurrence.

The RBFLN probability values corresponding to each randomlyselected positive or negative validation point (30 points in all) werethen checked. Table 5 shows the results for the different RBFLNmodels.As they all correctly classified the negative validation points in the lowand very low probability areas (classes [0.25–0.5] and [0–0.2.5], respec-tively), we focused on the positive validation points in order to definewhich model achieved the most accurate classification. First, we tookinto account the percentages of positive validation points appropriatelyclassified into high or very high probability areas (classes [0.5–0.75]and [0.75–1], respectively; Table 6). As two models, RBN1 and RBN3,reached the same score (93%), we then considered their respective per-centages of positive validation points classified in the very high proba-bility area. With 47% of the points in this class, RBN3 achieved thebest classification performance. For this model, the combination of thevery high and high probability zones covers 23% of the study area,while the very low and low probability areas occupy the remaining77% (Table 5). Fig. 6 shows the probability map created from RBN3(the RBFLN model with 50 RBF and trained with 80 iterations; Table 6).

RBN3 was furthermore validated by plotting the validation pointson the RBFLN probability value versus cumulative percentage of studyarea curve (Fig. 7). Fig. 7 clearly displays the distribution of the vali-dation points: all the negative validation points (squares) are locatedin the low and very low probability areas, 14 positive validationpoints (dots) are in the high or very high probability areas and only1 positive validation point is in the low probability area.

6. Discussion

In this study, the predictive classification abilities of the RBFLNmethod were tested for mapping AS soils. Using this method, we gen-erated different probability maps. For the most accurate probabilitymap (created from RBN3; Fig. 6), the combined very high and highprobability areas cover 23% of the study area and contain 94% of thevalidation points corresponding to AS soil occurrences (Table 5).The combined low and very low probability areas cover the re-maining 77% of the study area and contain all the validation pointscorresponding to non-AS soil sites (Table 5). Only one positive valida-tion point is located in the low probability area, which is acceptable,as a low probability does not completely exclude the possibility foran AS soil occurrence. In this application, the results show that

Fig. 5. Reclassified evidential data layers (Quaternary geology, multi-element low altitude airborne geophysics and slope derived from a Digital Elevation Model).

52 A. Beucher et al. / Journal of Geochemical Exploration 125 (2013) 46–55

unknown sites are properly classified by an RBFLN. In order to calcu-late the total extent of actual AS soils in the study area, we considerthe proportion of positive training points corresponding to actual ASsoils, on one hand, in the high and very high probability areas, andon the other hand, in the low and very low probability areas. Inorder to reduce bias due to potential over-representation of fine-grained sediment areas, we then consider: in the high and very highprobability areas, the proportion of positive training points located infine-grained sediments areas (half of the points in 98.3% of 23% of thearea; Table 7) and the proportion located in the remaining areas (halfof the points in 1.7% of 23% of the area; Table 7); in the low and verylow probability areas, the proportion of positive training points located

Table 3Sum of squared errors for a series of training tests.

SSE Iterations

40 80 120 160 200 240 280 320 360

20 5.706 4.773 5.109 4.547 4.308 4.099 4.007 3.948 3.88430 5.872 4.777 4.169 4.054 3.924 3.768 3.62 3.516 3.39440 5.652 4.296 3.435 3.269 3.126 2.99 2.88 2.777 2.698

R 50 5.29 3.097 1.885 1.528 1.387 1.255 1.143 1.068 1.004B 60 5.972 4.49 3.511 3.284 3.244 3.233 3.224 3.214 3.206F 70 6.271 4.81 3.644 3.291 3.229 3.214 3.207 3.195 3.186

in fine-grained sediments areas (one third of the points in 1.1% of 77% ofthe study area; Table 7) and the proportion located in the remainingareas (no points in 98.9% of 77% of the area; Table 7). Thus, the totalextent of actual AS soils can be roughly estimated, under ideal condi-tions, as: 0.5×0.226+0.5×0.004+0.33×0.008+0×0.762. The train-ing points extracted from Triipponen (1997) were excluded from thiscalculation in order to avoid any other bias. According to this estima-tion, the actual AS soil extent is 12%. According to previous studies(Nyberg et al., 2011; Nystrand et al., 2012), the recipient streams con-tain a very high amount of metals and sulfate, that is obviously relatedto a corresponding high proportion of AS soils. Nevertheless, metaland sulfate levels are not extreme in comparison to hot spot areas inMidwestern Finland (e.g. Vörå and Solf rivers' water data in Roos andÅström, 2005).Whilewedonot have yet any exactfigures on the extentof AS soils in corresponding rivers of the latter area, it appears that thecurrent estimation for the Sirppujoki River catchment may be over-estimated and thus considered as an upper limit. In comparison, theextent of potential AS soils (calculated with the same equation, usingpositive training points corresponding to potential AS soils) is 5.7%,giving a ratio actual/potential AS soils of 2. Potential AS soils appear insmaller quantities, but are to be handled cautiously, as they representsoils with their complete oxidation reservoir left. Nonetheless, althoughactual AS soils are oxidized, they still retain a certain oxidation and acid-ification potential (Nordmyr et al., 2006), and if drainage is enhanced orthe climate is changing, their underlying sulfidic horizons can be further

Table 4Training and classification parameters for six RBFLN models with different combinations of hidden neurons and iterations numbers.

RBFLN models Number of RBF Number of iterations Training Classification

Min class Max class MSEtrain SSEtrain Min class Max class MSEtrain SSEtrain

RBN1 40 80 0.0006 1.0379 0.086 4.296 0.0013 0.9208 0.1166 180.313RBN2 40 120 0.0008 1.1794 0.0687 3.435 0.0018 1.0396 0.1047 162.0326RBN3 50 80 0.0047 1.005 0.062 3.097 0.0007 1.0063 0.1218 188.4639RBN4 50 120 0.0015 1.1431 0.0377 1.885 0.0007 1.1782 0.1188 183.8469RBN5 60 80 0.003 1.0094 0.0899 4.496 0.0006 0.9045 0.1207 186.7339RBN6 60 120 0.0019 1.05 0.0702 3.511 0.0008 1.06 0.1134 175.4325

53A. Beucher et al. / Journal of Geochemical Exploration 125 (2013) 46–55

oxidized. The modelling results are, however, consistent with thestudies previously carried out in the Sirppujoki River catchment. Theoccurrence of AS soils was recorded by soil studies (Palko et al., 1985;Triipponen, 1997), as well as corresponding spatial chemical patternsfound in water studies (Nyberg et al., 2011; Nystrand et al., 2012). Con-sidering all of the above, the RBFLN method demonstrates goodpredictive classification abilities for mapping AS soils. From the landuse perspective, a probability map created with the RBFLN methodcan be used to target strategic places for mitigation, for example withinfuture agricultural subsidy and funding programs for the managementof AS soils environmental risks. We recommend complementary soilsampling should be carried out in high probability areas to verify andconfine the extent of AS soils, before any drainagework is implemented.Drainage in AS soil areas should always be done carefully to minimizethe oxidation of sulfides and the use of well planned/managed con-trolled drainage may have beneficial effects.

A very important step in the implementation of a neural networkmodelling is the selection of training points. As previously described,the proportions of positive and negative points should be identical(Porwal et al., 2003). In our case, the number of training and valida-tion points is adequate and the points are well covering the studyarea. The critical issue of over-learning was avoided as our RBFLNmodel managed to classify accurately the validation points. In thefuture, the training points could be given weights. These weightswould be representative of one or many attributes. For instance, thepoints could be ordered according to weights corresponding to actual

Table 5Validation of RBFLN models.

RBFLNmodels

Probability zone Area(km2)

% of studyarea

Number ofvalidation points

Positive Negative

RBN1 Very high [0.75–1] 46 10 10 0High [0.5–0.75] 73.6 16 9 0Low [0.25–0.5] 147.2 32 1 6Very low [0–0.25] 193.2 42 0 9

RBN2 Very high [0.75–1] 23 5 1 0High [0.5–0.75] 59.8 13 8 0Low [0.25–0.5] 138 30 5 5Very low [0–0.25] 239.2 52 1 10

RBN3 Very high [0.75–1] 41.4 9 7 0High [0.5–0.75] 64.4 14 7 0Low [0.25–0.5] 151.8 33 1 5Very low [0–0.25] 202.4 44 0 10

RBN4 Very high [0.75–1] 23 5 3 0High [0.5–0.75] 55.2 12 7 0Low [0.25–0.5] 142.6 31 4 3Very low [0–0.25] 239.2 52 1 12

RBN5 Very high [0.75–1] 41.4 9 6 0High [0.5–0.75] 69 15 7 0Low [0.25–0.5] 142.6 31 2 5Very low [0–0.25] 207 45 0 10

RBN6 Very high [0.75–1] 55.2 12 8 0High [0.5–0.75] 59.8 13 5 0Low [0.25–0.5] 128.8 28 2 4Very low [0–0.25] 216.2 47 0 11

(field pH lower than 4.5) or potential AS soils (field pH higher than4.0, but incubation pH lower). One could also focus on the initialdepth where sulfides appear for each training point. This might resultin a probability map offering a better representation for land use,as this depth information is crucial for the planning of mitigation(in case of field drainage, road or building construction). Comparingapplications of RBFLN using training points with or without weightswould be interesting, as it would provide information on differentlevels.

While applying this RBFLN method, one limitation could be no-ticed: the number of evidential data layers. For instance, in mineralprospectivity mapping, the neural network applications generallybenefit from a larger amount of evidential data layers. In the case ofAS soil mapping, the number of raster datasets can appear as limitedeven if each data layer is contributing to the modelling as seen previ-ously. In the future, refinement will be sought by using more rasterdata layers within the neural network applications. We see possibili-ties in the use of geochemical data such as water quality, and spatialdata on vegetation or land use.

As the RBFLNmethod requires the use of known AS soil and non-ASsoil sites, some preliminary conventional mapping tasks (i.e. collatingnew data from soil sampling, as well as already existing data fromliterature) are needed to gather training and validation points. Thepre-modeling data gathering stage is, however, considerably shorterthan a complete conventional mapping survey carried out on thewhole study area. For this study, 49 soil profiles were sampled inthe field during 10 days and 13 soil profiles were extracted fromliterature (Triipponen, 1997). Conventional mapping alone wouldhave required more soil surveying, as well as preliminary aerial photoand map interpretation, this all being time and resource-consuming.Therefore, the RBFLNmethod constitutes a significant advance, allowingthe AS soil mapping process to be significantly faster andmore efficient.Moreover, being a spatial modeling technique based on geostatistics,RBFLN is an objective approach, whereas conventional mapping is high-ly subject to the person carrying it out. Nevertheless, a final expertknowledge evaluation of themodelling results is necessary to avoid po-tential flaws related, for instance, to software bugs or data handling.Consequently, we recommend the RBFLN modeling, perfected by anexpert assessment, in order to create reliable and comparable AS soilmaps over large areas.

Table 6RBFLN models validation performance.

RBFLNmodels

NumberOf RBF

Number ofiterations

SSEtrain % of appropriately classifiedvalidation vectors:

positive (in class [0.75–1]) negative

RBN1 40 80 4.296 93 (33) 100RBN2 40 120 3.435 60 (7) 100RBN3 50 80 3.097 93 (47) 100RBN4 50 120 1.885 67 (20) 100RBN5 60 80 4.496 87 (40) 100RBN6 60 120 3.511 87 (53) 100

Fig. 6. Probability map created from RBN3.

54 A. Beucher et al. / Journal of Geochemical Exploration 125 (2013) 46–55

7. Conclusion

The application of an artificial neural network (ANN) methodcalled Radial Basis Function Links Nets (RBFLN) constitutes an impor-tant development in the acid sulfate (AS) soil mapping. The RBFLNmethod was suitable as it utilizes both known AS soil and non-ASsoil sites, allowing a good definition of both high and low probabilityareas, as well as available evidential data layers (in our case, Quater-nary geology, multi-element low altitude airborne geophysics andslope derived from a Digital Elevation Model). For the most accurateprobability map created with RBFLN, the very high and high probabil-ity areas occupy 23% of the study area and contain 94% of the knownAS soil occurrences used as validation points. Moreover, the low andvery low probability areas covering the remaining 77% of the study

Fig. 7. Variation of RBFLN probability values with cumulative percentage of study areafor RBN3: positive validation points are white dots (n=15) and negative ones blacksquares (n=15).

area contain all the known non-AS soil sites. These results, whichare in line with the studies previously carried out in the study area,indicate the good predictive classification abilities of an RBFLN formapping AS soils. The modelling constitutes a powerful, objective ap-proach, which is perfected by a final expert knowledge evaluation,making the whole AS soil mapping process faster and more efficient.Therefore, we recommend the application of an RBFLN, finalized byan expert assessment, in order to create reliable and comparable ASsoil maps over large areas.

Acknowledgements

The authors thankfully acknowledge the financial support from theVALUE Graduate School and Maa-ja Vesitekniikan Tuki ry. The authorswould also like to thank Richard Siemssen and Anton Grindgärds forthe assistance with field work and sample preparation.

Table 7Distribution of the training points originating from conventional soil sampling, pointsextracted from Triipponen (1997) excluded.

RBN3 Numberof pointscorrespondingto:

Probability areas:

Low[0.25–0.5]

High[0.5–0.75]

Very high[0.75–1]

Very low[0–0.25]

Positive ActualAS soils

In fine-grainedsediment areas:

0 3 4 10

In remainingareas:

0 0 1 0

PotentialAS Soils

In fine-grainedsediment areas:

0 0 2 5

In remainingareas:

0 0 0 0

Negative Non-AS In fine-grainedsediment areas:

0 3 6 1

Soils In remainingareas:

21 2 1 0

55A. Beucher et al. / Journal of Geochemical Exploration 125 (2013) 46–55

References

Airo, M.-L., Loukola-Ruskeeniemi, K., 2004. Characterization of sulfide deposits byairborne magnetic and gamma-ray responses in eastern Finland. Ore GeologyReviews 24, 67–84.

Barnett, C.T., Williams, P.M., 2009. Using geochemistry and neural networks to mapgeology under glacial cover. Geoscience BC, Report 2009-003. (26 pp.).

Behnia, P., 2007. Application of radial basis functional link networks to exploration forProterozoic mineral deposits in Central Iran. Natural Resources Research 16 (2),147–155.

Bierwirth, P.N., Brodie, R.S., 2005. Identifying acid sulfate soil hotspots from airbornegamma-radiometric data and GIS analysis. Australian Bureau of Rural sciences.(http://www.daff.gov.au/brs/publications).

Bonham-Carter, G.F., 1994. Geographic Information Systems for Geoscientists—Modellingwith GIS. Computer Methods in the Geosciences 13. Pergamon, Oxford . (398 pp.).

Corsini, A., Cervi, F., Ronchetti, F., 2009. Weight of evidence and artificial neural networksfor potential groundwater spring mapping: an application to the Mt. Modino area(Northern Apennines, Italy). Geomorphology 111 (1-2), 79–87 (1 October 2009).

De Smith, M.J., Goodchild, M.F., Longley, P.A., 2009. Geospatial analysis— a comprehen-sive guide to principles, techniques and software tools, third edition. (www.spatialanalysisonline.com).

Donner, J., 1995. The Quaternary History of Scandinavia: World and Regional Geology7. Cambridge University Press, Cambridge, United Kingdom. (199 pp.).

Edén, P., Auri, J., Rankonen, E., Martinkauppi, A., Österholm, P., Beucher, A., Yli-Halla,M., 2012a. Mapping acid sulfate soils in Finland — methods and results. 7thIASSC abstract, Vaasa, Finland.

Edén, P., Rankonen, E., Auri, J., Yli-Halla, M., Österholm, P., Beucher, A., Rosendahl, R.,2012b. Definition and classification of Finnish Acid Sulfate Soils. 7th IASSC abstract,Vaasa, Finland.

Ermini, L., Catani, F., Casagli, N., 2004. Artificial Neural Networks applied to landslidesusceptibility assessment. Geomorphology 66 (1-4), 327–343 (1 March 2005).

Fitzpatrick, R., Powell, B., Marvanek, S., 2008. Atlas of Australian Acid Sulfate Soils. In:Fitzpatrick, Rob, Shand, Paul (Eds.), Inland Acid Sulfate Soil Systems Across Australia. :CRC LEME Open File Report No. 249. (Thematic Volume). CRC LEME, Perth, Australia,pp. 75–89.

Looney, C.G., 1997. Pattern Recognition Using Neural Networks: Theory and Algorithmsfor Engineers and Scientists. Oxford University Press, New York. (458 pp.).

Looney, C.G., 2002. Radial basis functional link nets and fuzzy reasoning. Neurocomputing48, 489–509.

Looney, C.G., Yu, H., 2001. Special software development for neural network and fuzzyclustering analysis in geological information system. http://ntserv.gis.nrcan.gc.ca/sdm/.

National Board of Waters, 1977. Lounais-Suomen vesienkäytön kokonaissuunnitelma(in Finnish). Vesihallitus — National Board of Waters, Finland, Report 126.

Nordmyr, L., Boman, A., Åström, M., Österholm, P., 2006. Estimation of leakage of chemicalelements from boreal acid sulphate soils. Boreal Environmental Research 11, 261–273.

Nyberg, M.E., Österholm, P., Nystrand, M., 2011. Impact of acid sulfate soils on the geo-chemistry of rivers in south-western Finland. Environmental Earth Sciences http://dx.doi.org/10.1007/s12665-011-1216-4.

Nykänen, V., 2008. Radial basis functional link nets used as a prospectivity mapping toolfor orogenic gold deposits within the Central Lapland Greenstone Belt, NorthernFennoscandian Shield. Natural Resources Research 17 (1), 29–48.

Nystrand, M.I., Österholm, P., Nyberg, M.E., Gustafsson, J.P., 2012. Metal speciation inrivers affected by enhanced soil erosion and acidity. Applied Geochemistry http://dx.doi.org/10.1016/j.apgeochem.2012.01.009.

Palko, J., 1994. Acid sulphate soils and their agricultural and environmental problemsin Finland, Acta University Oulu, C 75. PhD thesis, University of Oulu.

Palko, J., Räsänen, M., Alasaarela, E., 1985. Happamien sulfaattimaiden esiintyminenja vaikutus veden laatuun Sirppujoen vesistöalueella (in Finnish). Vesihallitus —

National Board of Waters, Finland, Report 260.Perttunen, M., Lappalainen, E., Taka, M., Herola, E., 1984. Vehmaan, Mynämäen,

Uudenkaupungin ja Yläneen kartta-alueiden maaperä. Summary: Quaternarydeposits in the Vehmaa, Mynämäki, Uusikaupunki and Yläne map-sheet areas,1:100 000, 51 pp.

Porwal, A., Carranza, E.J.M., Hale, M., 2003. Artificial neural networks for mineralpotential mapping; a case study from Aravalli Province, Western India. NaturalResources Research 12 (3), 155–171.

Roos, M., Åström, M., 2005. Hydrochemistry of rivers in an acid sulphate soil hotspotarea in western Finland. Agricultural and Food Science 14, 24–33.

Sawatzky, D.L., Raines, G.L., Bonham-Carter, G.F., Looney, C.G., 2009. Spatial Data Modeller(SDM): ArcMAP 9.3 geoprocessing tools for spatial data modelling using weights ofevidence, logistic regression, fuzzy logic and neural networks. http://arcscripts.esri.com/details.asp?dbid=15341.

Soil Survey Staff, 1999. Soil Taxonomy, A basic system of soil classification for makingand interpreting soil surveys2nd ed. : Agriculture Handbook 436.

Sundström, R., Åström, M., Österholm, P., 2002. Comparison of the metal content inacid sulphate soil runoff and industrial effluents in Finland. Environmental Scienceand Technology 36, 4269–4272.

Suppala, I., Lintinen, P., Vanhala, H., 2005. Geophysical characterising of sulphide richfine-grained sediments in Seinäjoki area, western Finland. Geological Survey ofFinland Special Paper 38, 61–71.

Triipponen, J.-P., 1997. Sirppujoen valuma-alueen happamuustutkimus. Lounais-Suomen ympäristökeskus. (43 pp.).

Tsoukalas, L.H., Uhrig, R.E. (Eds.), 1997. Fuzzy and Neural Approaches in Engineering.John Wiley & Sons, Inc., New York (587 pp.).

Vanhala, H., Suppala, I., Lintinen, P., 2004. Integrated Geophysical Study of AcidSulphate Soil Area Near Seinäjoki, Southern Finland. Sharing the Earch: EAGE66th Conference & Exhibition, Paris, France, 7–10 June 2004: extended abstracts.EAGE, Houten (4 pp., Optical disc (CD-ROM)).

Yli-Halla, M., 1997. Classification of acid sulphate soils of Finland according to SoilTaxonomy and the FAO/UNESCO legend. Agricultural and Food Science 6, 247–258.

Yli-Halla, M., Räty, M., Puustinen, M., 2012. Varying depth of sulfidic materials: challengeto sustainable management. 7th IASSC abstract, Vaasa, Finland.