24
Machine Learning With GA Optimization to Model the Agricultural Soil-Landscape of Germany: An Approach Involving Soil Functional Types With Their Multivariate Parameter Distributions Along the Depth Prole Mareike Ließ 1 *, Anika Gebauer 1 and Axel Don 2 1 Department of Soil System Science, Helmholtz Centre for Environmental Research UFZ, Halle (Saale), Germany, 2 Thünen Institute of Climate-Smart Agriculture, Braunschweig, Germany Societal demands on soil functionality in agricultural soil-landscapes are confronted with yield losses and environmental impact. Soil functional information at national scale is required to address these challenges. On behalf of the well-known theory that soils and their site-specic characteristics are the product of the interaction of the soil-forming factors, pedometricians seek to model the soil-landscape relationship using machine learning. Following the rationale that similarity in soils is reected by similarity in landscape characteristics, we dened soil functional types (SFTs) which were projected into space by machine learning. Each SFT is described by a multivariate soil parameter distribution along its depth prole. SFTs were derived by employing multivariate similarity analysis on the dataset of the Agricultural Soil Inventory. Soil proles were compared on behalf of differing sets of soil properties considering the top 100 and 200 cm, respectively. Various depth weighting coefcients were tested to attribute topsoil properties higher importance. Support vector machine (SVM) models were then trained employing optimization with a distributed multiple-population hybrid Genetic algorithm for parameter tuning. Model training, tuning, and evaluation were implemented in a nested k-fold cross-validation approach to avoid overtting. With regards to the SFTs, organic soils were differentiated from mineral soils of various particle size distributions being partly inuenced by waterlogging and groundwater. Further SFTs reect soils with a depth limitation within the top 100 cm and high stone content. Altogether, with SVM predictive model accuracies between 0.7 and 0.9, the agricultural soil-landscape of Germany was represented with eight SFTs. Soil functionality with regards to the soils capacity to store plant-available water and soil organic carbon is well characterized. Four additional soil functions are described to a certain extent. An extension of the approach to fully cover soil functions such as nutrient cycling, agricultural biomass production, ltering of contaminants, and soil as a habitat for soil biota is possible with the inclusion of additional soil properties. Edited by: Asim Biswas, University of Guelph, Canada Reviewed by: Chongchong Qi, Central South University, China Songchao Chen, Institut National de recherche pour lagriculture, lalimentation et lenvironnement (INRAE), France *Correspondence: Mareike Ließ [email protected] Specialty section: This article was submitted to Environmental Informatics and Remote Sensing, a section of the journal Frontiers in Environmental Science Received: 09 April 2021 Accepted: 31 May 2021 Published: 25 June 2021 Citation: Ließ M, Gebauer A and Don A (2021) Machine Learning With GA Optimization to Model the Agricultural Soil-Landscape of Germany: An Approach Involving Soil Functional Types With Their Multivariate Parameter Distributions Along the Depth Prole. Front. Environ. Sci. 9:692959. doi: 10.3389/fenvs.2021.692959 Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 692959 1 ORIGINAL RESEARCH published: 25 June 2021 doi: 10.3389/fenvs.2021.692959

Machine learning with GA optimization to model the

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine learning with GA optimization to model the

Machine Learning With GAOptimization to Model the AgriculturalSoil-Landscape of Germany: AnApproach Involving Soil FunctionalTypes With Their MultivariateParameter Distributions Along theDepth ProfileMareike Ließ1*, Anika Gebauer1 and Axel Don2

1Department of Soil System Science, Helmholtz Centre for Environmental Research – UFZ, Halle (Saale), Germany, 2ThünenInstitute of Climate-Smart Agriculture, Braunschweig, Germany

Societal demands on soil functionality in agricultural soil-landscapes are confronted withyield losses and environmental impact. Soil functional information at national scale isrequired to address these challenges. On behalf of the well-known theory that soils andtheir site-specific characteristics are the product of the interaction of the soil-formingfactors, pedometricians seek to model the soil-landscape relationship using machinelearning. Following the rationale that similarity in soils is reflected by similarity in landscapecharacteristics, we defined soil functional types (SFTs) which were projected into space bymachine learning. Each SFT is described by a multivariate soil parameter distribution alongits depth profile. SFTs were derived by employing multivariate similarity analysis on thedataset of the Agricultural Soil Inventory. Soil profiles were compared on behalf of differingsets of soil properties considering the top 100 and 200 cm, respectively. Various depthweighting coefficients were tested to attribute topsoil properties higher importance.Support vector machine (SVM) models were then trained employing optimization witha distributed multiple-population hybrid Genetic algorithm for parameter tuning. Modeltraining, tuning, and evaluation were implemented in a nested k-fold cross-validationapproach to avoid overfitting. With regards to the SFTs, organic soils were differentiatedfrom mineral soils of various particle size distributions being partly influenced bywaterlogging and groundwater. Further SFTs reflect soils with a depth limitation withinthe top 100 cm and high stone content. Altogether, with SVM predictive model accuraciesbetween 0.7 and 0.9, the agricultural soil-landscape of Germany was represented witheight SFTs. Soil functionality with regards to the soil’s capacity to store plant-availablewater and soil organic carbon is well characterized. Four additional soil functions aredescribed to a certain extent. An extension of the approach to fully cover soil functionssuch as nutrient cycling, agricultural biomass production, filtering of contaminants, and soilas a habitat for soil biota is possible with the inclusion of additional soil properties.

Edited by:Asim Biswas,

University of Guelph, Canada

Reviewed by:Chongchong Qi,

Central South University, ChinaSongchao Chen,

Institut National de recherche pourl’agriculture, l’alimentation et

l’environnement (INRAE), France

*Correspondence:Mareike Ließ

[email protected]

Specialty section:This article was submitted to

Environmental Informatics andRemote Sensing,

a section of the journalFrontiers in Environmental Science

Received: 09 April 2021Accepted: 31 May 2021Published: 25 June 2021

Citation:Ließ M, Gebauer A and Don A (2021)

Machine Learning With GAOptimization to Model the Agricultural

Soil-Landscape of Germany: AnApproach Involving Soil Functional

Types With Their MultivariateParameter Distributions Along the

Depth Profile.Front. Environ. Sci. 9:692959.

doi: 10.3389/fenvs.2021.692959

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 6929591

ORIGINAL RESEARCHpublished: 25 June 2021

doi: 10.3389/fenvs.2021.692959

Page 2: Machine learning with GA optimization to model the

Altogether, the developed data product represents the 3D multivariate soil parameterspace. Its agglomerated simplicity into a limited number of spatially allocated process unitsprovides the basis to run agricultural process models at national scale (Germany).

Keywords: pedometrics, soil functional types, soil parameter space, machine learning, optimization

1 INTRODUCTION

Soils are at the center of the agricultural ecosystem. On the onehand, their capacity to cycle and store nutrients and to provideplant-available water determines agricultural production with theultimate goal to feed mankind. On the other hand, the interplaybetween their storage and filter capacity determines how much ofthe applied fertilizer percolates to the groundwater andpotentially contaminates our drinking water. In today’sagricultural landscapes in Central Europe and other parts ofthe world, we face multiple challenges for soil functionalityimpacting ecosystem services. In the last decades, droughtevents that empty the soils’ water stores causing yield lossesare becoming more frequent in Europe (van Hateren et al., 2020;Markonis et al., 2021). Inadequate agricultural management maylead to losses in soil organic carbon to the atmosphere, whereasthe opposite can enhance carbon sequestration and, thereby, helpto mitigate climate change (Liu et al., 2006; Wiesmeier et al.,2019). Excessive groundwater nitrate pollution with values of50–380 mg L−1 is found in areas with intensive agriculture inGermany (Sundermann et al., 2020). More than 25% of therespective measurement sites report average values above thethreshold of 50 mg L−1 (Jakobs et al., 2020). Overall, economicand environmental risk is site-specific and depends on soilcharacteristics (Bönecke et al., 2020; Webber et al., 2020).Hence, the estimation of environmental impact andvulnerability of the farmer’s income as well as thedevelopment of adequate agricultural management strategies,policies, and farmers’ subsidies require site-specific soilinformation at national scale.

In pedometrics, continuous, site-specific soil information isgenerated by pedometric modeling approaches. Pedometrics is aninterdisciplinary science integrating soil science, appliedmathematics, statistics, and geoinformatics. The object ofinvestigation is the spatial-temporal soil variability at multiplescales. Empirical modeling approaches are used along withmultiple aspects of soil sensing and geodata analysis. Pleasecompare Minasny et al. (2013), Rossiter (2018), and Scullet al. (2003) for a review. Any modeling approach relies on aconceptual model on how traits and objects presuppose oneanother. Pedometric modeling to understand spatial soildistribution at the landscape scale follows the conceptualmodel of pedogenesis (Jenny, 1941), with soils and their site-specific characteristics being the product of the interaction of thesoil-forming factors through long periods of time. The functionalapproach was extended by McBratney et al. (2003) to includegeographic location and proxies to soil itself. The so-calledSCORPAN factors include S (proxies to soil), C (climate), O(organisms including land use, agricultural management etc.), R(relief), P (parent material), A (age), and N (geographic location).

Empirical modeling approaches heavily rely on the availabledata and how well these data capture or approximate the object ofinterest, its causes and drivers, or any functional relation betweenthem. Limitations in data availability in pedometric modelingconcern 1) the pedosphere and its characteristics, and 2) the soil-forming factors. In Germany and many other countries, access tosoil profile data is still limited or cumbersome. There is some lightat the horizon with the soil profile database of the AgriculturalSoil Inventory, which was recently published open access(Poeplau et al., 2020). The same applies to the LUCASEuropean topsoil database (Tóth et al., 2013). Still, access tothe large amount of soil profile data that was collected by theregional and national soil survey institutions requires tediousnegotiation with multiple parties. On the contrary, nationwidespatially continuous geodata to approximate the soil-formingfactors, are freely available from multiple sources. These includedata products derived from remote sensing, products obtained byinterpolating local point measurements, andmap products. Thereare of course restrictions. The landscape’s geomorphology,climate, vegetation and land use have changed during the longperiod of pedogenesis. Whereas the available data to approximatethe soil-forming factors only cover the last decades.

Machine learning algorithms are good at deriving knowledgefrom highly complex data. They are, therefore, often applied inpedometric modeling to extract the functional soil-landscaperelation and to project soil information into space. Thecomplexity of the task ranges from single variable values atgeographic point locations that are projected into thecontinuous two-dimensional univariate soil parameter spaceup to multivariate auto-correlated transect data (soil profiles)that need to be projected into the continuous three-dimensionalmultivariate space. Recent applications addressing individualtopsoil properties are presented by e.g. Møller et al. (2020), orZeraatpisheh et al. (2020). Approaches to model the three-dimensional soil parameter space can be summarized asfollows: The ‘2.5D approach’ builds individual models forsingle soil properties at selected soil depths and combines thespatial predictions (e.g. Taghizadeh-Mehrjardi et al., 2020; Maet al., 2021). The ‘depth function approach’ fits a continuousmathematical function through the available horizon data andthen projects the function’s parameters into space (e.g. Bishopet al., 1999; Veronesi et al., 2012). “3D regression kriging”modelsthe spatial trend and spatial autocorrelation (e.g. Poggio andGimona, 2014; Poggio and Gimona, 2017). Furthermore,convolutional neural networks are becoming increasinglypopular for multi-target machine learning advancing the 2.5Dapproach and the depth function approach (e.g. Behrens et al.,2018a; Padarian et al., 2019). A somewhat different path to modelthe multivariate 3D soil parameter space is the spatial predictionof soil systematic units (SUs) and their associated soil

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 6929592

Ließ et al. Multivariate 3D Soil Parameter Space

Page 3: Machine learning with GA optimization to model the

characteristics. It has the benefit that soil profile information isnot disassembled. Horizontation and property characteristics inthe predictions resemble true pedons. Recent studies followingthis approach are by Esfandiarpour-Boroujeni et al. (2020) andSharififar et al. (2019). However, in many soil classificationsystems, important soil properties guiding soil functionalityare only distinguished at a low systematic level and rathersimilar soils concerning their properties and functionality areassigned to different upper-level SUs. The problem concerns thedifferentiation betweenmineral and organic soils, soil particle sizedistribution, the occurrence of stagnic properties, groundwaterinfluence and many more. Accordingly, this approach wouldlargely benefit from the definition of soil functional types (SFTs).

To bring about their full potential, machine learningalgorithms usually require tuning, i.e. searching for the bestcombination of the algorithm’s parameters. “Best” means thatthe model algorithm is adapted to provide the prediction with thelowest error on independent data. The approach generallyfollowed in pedometric modeling, is testing a set of predefinedparameter combinations (e.g. Emadi et al., 2020; Zhang et al.,2020). While this works fine for algorithms with discreteparameters (e.g. Random Forest) allowing for an exhaustivesearch, it most likely will not find the optimal solution in caseof continuous parameters with an infinite number of possiblevalues (e.g. boosted regression trees or support vector machines).Consequently, the flexibility of most machine learning algorithmscan only be exploited if they are combined with optimizationalgorithms for parameter tuning. A promising group ofoptimization algorithms are the so-called genetic orevolutionary algorithms, developed by Holland (1975). Theysimulate biological processes to optimize highly complexobjective functions. The algorithms optimize parameters withextremely complex cost surfaces, provide a list of optimalsolutions, and are well suited for parallel computing (Hauptand Haupt, 1998). Pedometric applications using optimizationfor parameter tuning are scarce (e.g. Gebauer et al., 2019;Wadouxet al., 2019; Gebauer et al., 2020). Further applications related tosoil science include e.g. Ardakani and Kordnaeij (2017), Mazaheriand Jafarian (2019), and Nguyen et al. (2020).

The currently available spatially continuous soil informationfor Germany consists of a conventional digital polygon mapproduct (BÜK) at map scales 1:1,000,000 and 1:250,000 (BGR,2013; BGR, 2018). Unfortunately, these valuable map productswith high information content have limitations when it comes totheir usage for soil parametrization in spatially explicitagricultural process models. This is not surprising as they wereneither intended for this purpose nor to provide site-specificinformation. Their spatial map units (SMUs) each define aparagenesis of SUs with highly differing properties. The spatialallocation of these SUs within the SMUs is unknown. FurtherBÜK derived map products of topsoil, and pedon agglomeratedvalues of soil properties and functions are commonly generatedby assigning the properties of the SMU’s dominating soil type tothe whole SMU. Further SUs with sometimes high areal coveragesthat amount to more than 50% of the SMU are often neglected.Site-specific data products covering entire Germany weredeveloped by pedometric modeling approaches at European

and global scale. Nonetheless, the European products onlycover the top 20 soil centimetres (de Brogniez et al., 2015;Ballabio et al., 2016). The global map products (SoilGrids,Hengl et al., 2017) are unreliable for German subsoils due tothe previously discussed problem in soil profile data access (Tifafiet al., 2018).

Soils are complex systems characterised by their horizontationwith the respective physical and chemical properties. It is thesesoil characteristics that determine major soil functions (SFs): Thespecification of the properties shape the soil as a habitat foradapted communities of soil organisms (SF1), who subsequentlycontrol nutrient cycling (SF2). SF2 and the soil’s capacity to storeplant-available water (SF3) enable agricultural biomassproduction (SF4). Furthermore, in the context ofenvironmental protection, soils act as a filter for contaminants(SF5) and contribute to the mitigation of climate change due totheir carbon storage capacity (SF6). The nationwide site-specificevaluation of the state and potential of our soils with regards tothese functions (e.g. Terribile et al., 2011; Greiner et al., 2018;Vogel et al., 2019) requires site-specific information targeting soilfunctionality. The same applies to the modeling of soil functionaldynamics due to agricultural management (Vogel et al., 2018), theevaluation and modeling of climate change on agricultural yields(e.g. White et al., 2011; Webber et al., 2020), and theenvironmental impact of agricultural production on ecosystemservices such as clean drinking water (e.g. Knoll et al., 2020;Sundermann et al., 2020). We ultimately seek to generate therequired nationwide representation of the respective 3D soilparameter space of the agricultural soil-landscape (Germany).The approach we follow comprises two parts: First, we distinguishSFTs, each being represented by the corresponding multivariatedistribution of soil properties along its depth profile. Then wederive the SFTs’ functional relation with the soil-forming factorsby machine learning, to understand their embedding in theparticular landscape context. Finally, the trained machinelearning models are used to project the SFTs to thecontinuous space.

2 MATERIAL AND METHODS

2.1 Landscape SettingGermany covers an area of 357.6 km2 of which 51% are currentlyunder agricultural land use (Statistisches Bundesamt, 2020).Figure 1A provides an overview of the spatial pattern onbehalf of the CORINE Land Cover Inventory (Büttner et al.,2017). Germany comprises four morphologic regions from northto south: The North German Lowland, the Central GermanUplands, the Alpine Foreland, and the Alps. Figure 1B showsthe altitudinal specification, Figure 1E provides details oftopographical landforms. Most of the North German Lowlandhas an altitude below 100 m above sea level (a.s.l.). While there arewetlands, peatlands and marshy terrain along the North SeaCoast, the North-Eastern part of Germany shows glacialinfluence with many lakes, and moraines. The mountains ofthe central upland region are moderate in height, rarely above1,000 m a.s.l. They were influenced by various phases of upheaval

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 6929593

Ließ et al. Multivariate 3D Soil Parameter Space

Page 4: Machine learning with GA optimization to model the

and subsidence, developed basin structures with sedimentarydeposits, and comprise alluvial glacial loess deposits inbetween them. Besides, many of the mountain ranges displaysigns of ancient volcanism altogether leading to a complexgeological pattern. Ranging between c 400 and 750 m a.s.l., theAlpine Foothills region was shaped under glacial influence and

displays a high variety of geomorphological landforms, being amolasses basin with sedimentary deposits from Alpine erosion,morainic hills and aprons. The German part of the Alps belongsto the Northern Calcareous Alps. U-shaped valleys remind of iceage influences. Figure 1F shows the soil parent material at mapscale 1:5,000,000. Please compare (Asch et al., 2003; Küster and

FIGURE 1 | Landscape setting, maps of selected parameters. (A) CORINE Land Cover Inventory 2018 (Büttner et al., 2017), (B) EU Digital Elevation Model (©European Union, Copernicus Land Monitoring Service 2017, European Environment Agency (EEA)), (C) Average air temperature of the summer months (DWD, 2018a),(D) Sum of precipitation of the summer months (DWD, 2018b), (E) Geomorphographic units (BGR, 2007) [Legend], (F) Parent material (BGR, 2008b) [Legend], and (G)Soil scapes of Germany (BGR, 2008a) [Legend].

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 6929594

Ließ et al. Multivariate 3D Soil Parameter Space

Page 5: Machine learning with GA optimization to model the

Stöckhert, 2003; Liedtke and Mäusbacher, 2003) for furtherdetails.

German climate is guided by a decrease in a maritime and anincrease in a continental climatic character from west to eastexpressed by an eastward higher yearly temperature range.Likewise, in the north German lowlands, there is a cleardecrease in the amount of precipitation to the east (Figures1C,D). This continental effect is less pronounced in centraland southern Germany since the precipitation-differentiatingeffect of the central mountain ranges superimposes the west-east decrease and ensures a more varied, local precipitationregime. The mountain climate of the low mountain rangesand the German Alps stands out due to the mean verticaldecrease in air temperature of about 0.6–0.7 K (spring andsummer) and 0.4–0.5 K (autumn and winter) per 100 mincrease in altitude. Mean annual precipitation varies spatiallybetween c. 400 and 3,200 mm (period 1961–1990). Seasonally,precipitation is lower in the hydrological winter half-year than inthe summer half-year. Concerning air temperature, there is apredominant temperature gradient from south to north and fromeast to west. The nationwide regional mean value of the airtemperature is 8.2°C (1961–1990). The recorded extremes are amaximum of 42.6°C and a minimum of −37.8°C. Please refer toAlexander (2003), Endlicher and Hendl (2003), Klein and Menz(2003) for further details.

Soil distribution in Germany is primarily determined byparent material and topography. Figure 1G displays the soilscapes of Germany (BGR, 2008a). The map distinguishes soils ofthe coast region and peatlands, soils of river valleys, soils ofslightly hilly landscapes, soils of the loess region, soils of theCentral German Uplands, and soils of the alpine region. Forfurther details please refer to Adler et al. (2003) and BGR (2018).

2.2 Data2.2.1 Soil Profile DatabaseSoil profile data was collected on an 8 × 8 km raster at 3,104 sitesin the context of the first national Agricultural Soil Inventory(Jacobs et al., 2018). The dataset available for this study comprisesthe soil profile location (coordinates), horizontation and soilprofile description according to the German soil survey system(Boden, 2005) to a maximum depth of 200 cm, as well as labmeasurements concerning the particle size distribution, bulkdensity, stone content, total organic carbon content (TOC),total inorganic carbon content (TIC), electrical conductivity(EC) and the pH value in depth increments of 0–10, 10–30,30–50, 50–70, 70–100, 100–150, and 150–200 cm. Samples weretaken per depth increment while taking into account thehorizontation, i.e. including multiple samples per depthincrement for each corresponding soil horizon presentwith ≥5 cm.

Sampling comprised disturbed samples and undisturbedsamples (steel cores). Depending on the present stone contentand the size of the fragments, steel cores of varying sizes (250, 100,or 5 cm³) were taken to determine the bulk density of the fineearth fraction. The bag samples were dried to constant weight at40°C, with samples high in TOC (≥ 87 g kg-1) being dried at ahigher temperature of 60°C. Steel core samples were dried at

105°C to then determine bulk density. Further samplepreparation included sieving to 2 cm to separate the fine earthfraction from the coarser material. Soil texture determinationwith seven particle size separates — sand [2.0–0.63, 0.63–0.2,0.2–0.063 mm], silt [0.063–0.02, 0.02–0.0063, 0.0063–0.002 mm],and clay [<0.002 mm] was conducted according to DIN ISO11277. The total carbon content was determined using drycombustion. TOC and TIC were differentiated with theremoval of TOC via thermo gradient dry combustion. pH andEC were determined in H2O. Details of the agricultural soilinventory soil survey, lab methodology, data overview andsummary statistics can be obtained from Jacobs et al. (2018),and Poeplau et al. (2020).

2.2.2 Gridded Geo-InformationProxies of the SCORPAN factors were derived from multiplesources. Table 1 provides an overview. Concerning SCORPANC,30 years’ seasonal averages (1961–1990) of air temperature andthe sum of precipitation of the winter (DJF) and the summer(JJA) months were derived from the German Weather Service(DWD, 2018a; DWD, 2018b). Seasonal averages of the summerand winter drought index (DWD, 2018c) were calculated asP/(T + 10) using the air temperature T (in degree Celsius)from DWD temperature grids and P (in mm) from DWDprecipitation grids.

To approximate SCORPAN O, four data products wereincluded. The 2016 and 2018 yearly average composites of twovegetation indices were derived from the European Data Portal.The indices were calculated from Sentinel-2 Level 2A data. TheNormalized Difference Vegetation Index (NDVI) combines thevegetation specific reflection characteristics of the wavelengthranges 600–700 nm (RED) and 700–1,300 nm (NIR) and,thereby, provides insight into plant vitality. The NormalizedDifference Red Edge (NDRE) is similar to the NDVI but usesthe NIR range and the red edge inflexion point (Barnes et al.,2000). Data on dry matter productivity (DMP) and theVegetation Productivity Index (VPI) of the time slot June11th-20th of the years 2016 and 2018 were derived from theCopernicus Global Land Service. DMP reflects the overall growthrate of the vegetation with units adapted for agro-statisticalpurposes (Swinnen and Van Hoolst, 2019). The VPI assessesthe condition of the vegetation. It is a percentile ranking of thecurrent NDVI against its historical range of variability. A value of100% indicates the best, a value of 50% the median vegetationstate (Swinnen and Toté, 2015). Differences between the dry year2018 and the rather wet year 2016 of all four indices wereadditionally included to relate crop phenology affected bydrought to soil properties such as the root-zone plant availablewater capacity. They, therefore, also refer to SCORPAN S.

SCORPAN R is represented by a map product of terrainclassification. The geomorphographic map of Germany in mapscale 1:1,000,000 (Figure 1E) comprises 25 discrete unitsdifferentiating between the four major German landscapetypes. In addition, terrain parameters relating to surfacetopography were calculated by the SAGA — System forAutomated Geoscientific Analyses (Conrad et al., 2015) onbehalf of the EU–DEM v1.0 digital elevation model. An

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 6929595

Ließ et al. Multivariate 3D Soil Parameter Space

Page 6: Machine learning with GA optimization to model the

overview of the terrain parameters and the respective modules tocalculate them is given in Table 2. The circular variable “aspect”was decomposed into northness and eastness. Hydrologicalterrain parameters were calculated on behalf of the pre-processed DEM: Sinks were filled and the stream network wasburned into the DEM. Major streams and further river segmentswere derived from the CCM River and Catchment Database(Vogt and Foisneau, 2007). The German coastline was added tothe river network previous to calculating the overland flowdistance, to take the inclination towards the sea into account.

The map of the “Groups of soil parent material” in Germany(BAG, Figure 1F) was included to approximate SCORPAN P.Lithology and stratigraphy according to the hydrogeological mapof Germany (HÜK, map scale 1:250.000, BGR and SDG, 2019)were additionally incorporated. While the geological informationon lithology, stratigraphy, and genesis of the geological map ofGermany provided the basic data, the information was replaced

and completed by other regional geological and hydrogeologicalmaps and data where necessary.

Proxies to soil itself (SCORPAN S) can generally be included inthe form of conventional soil polygon maps, and remote sensingdata products relating to soil properties (e.g. Castaldi et al., 2019;Safanelli et al., 2020; Vaudour et al., 2021).We included the map ofthe German soil scapes (Figure 1G). It subdivides Germany into 12soil regions comprising clearly defined soil scapes that followtopography and geology. As previously mentioned, differencesin vegetation indices between the dry year 2018 and the ratherwet year 2016 relate to crop phenology affected by drought.Accordingly, they may relate to soil properties such as root-zone plant available water capacity.

All obtained covariates were resampled to the INSPIRE —Infrastructure for Spatial Information in Europe— grid topologyat 100 m resolution (JRC, 2013). The nearest-neighbor methodwas used for categorical predictors, B-spline interpolation was

TABLE 1 | Geo-information data source.

Soil forming factor Abbreviation Description Data source

Climate PRESU Average seasonal precipitation (summer) [raster, 1,000 m] DWD (2018b)PREWI Average seasonal precipitation (winter) [raster, 1,000 m]

TEMSU Average seasonal temperature (summer) [raster, 1,000 m] DWD (2018a)TEMWI Average seasonal temperature (winter) [raster, 1,000 m]

DINSU Average seasonal drought index (summer) [raster, 1,000 m] DWD (2018c)DINWI Average seasonal drought index (winter) [raster, 1,000 m]

Organisms/ Soil NDV16 Normalized difference vegetation index, June 2016 [raster, 10 m] https://www.europeandataportal.euNDV18 Normalized difference vegetation index, June 2018 [raster, 10 m]NDV86 Normalized difference vegetation index, NDV18–NDV16 [raster, 10 m]NDR16 Normalized difference red edge, June 2016 [raster, 10 m]NDR18 Normalized difference red edge, June 2018 [raster, 10 m]NDR86 Normalized difference red edge, NDR18–NDR16 [raster, 10 m]

DMP16 Dry matter productivity, June 2016 [raster, 300 m] Swinnen and Van Hoolst (2019)DMP18 Dry matter productivity, June 2018 [raster, 300 m]DMP86 Dry matter productivity, DMP18–DMP16 [raster, 300 m]

VPI16 Vegetation productivity index, June 2016 [raster, 300 m] Swinnen and Toté (2015)VPI18 Vegetation productivity index, June 2018 [raster, 300 m]VPI86 Vegetation productivity index, VPI18–VPI16 [raster, 300 m]

Topography GMK Geomorphographic map of Germany [raster, 250 m resolution, map scale 1:1,000,000]

BGR (2007)

DEM Digital elevation model [raster, 25 m resolution], and derived products(Table 2)

© European Union, Copernicus Land MonitoringService 2017, European EnvironmentAgency (EEA)

Parent material/Soil LIT Lithology, Hydrogeological map of Germany [polygon shapefile, map scale 1:250,000]

BGR and SDG (2019)

STR Stratigraphy, Hydrogeological map of Germany [polygon shapefile, map scale1:250,000]

BGR and SDG (2019)

BAG Groups of soil parent material in Germany [polygon shapefile, map scale 1:5,000,000]

BGR (2008b)

BGL Soil scapes in Germany [map scale 1:5,000,000] BGR (2008a)

Geographic location LAT00 INSPIRE Latitude JRC, 2013LON00 INSPIRE Longitude

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 6929596

Ließ et al. Multivariate 3D Soil Parameter Space

Page 7: Machine learning with GA optimization to model the

applied for numeric predictors. INSPIRE latitude and longitudewere additionally included to represent geographic location(SCORPAN N), and particularly to represent spatial patterns notcaptured in the other data sources. The national border and coastlineof Germany were derived on behalf of the digital land model at mapscale 1:250,000 (version 2.0) provided by the Federal Agency forCartography and Geodesy (© GeoBasis-DE / BKG, 2020).

2.3 Procedure to Derive Soil FunctionalTypesThe aforementioned major soil functions are determined by thesoil’s physical and chemical properties. The soil’s habitat function(SF1) is characterised by most if not all of the properties. Particlesize distribution, bulk density, organic matter content andcomposition, redox conditions, pH, and salinity shape thecomposition of the biological community. Nutrient cycling (SF2)depends on the biological community and the aforementionedcharacteristics. The storage of plant available water (SF3)depends on the corresponding soil volume’s particle sizedistribution, the organic matter content, bulk density, and thedepth to a root-impenetrable layer or bedrock. Agriculturalbiomass production (SF4) depends on SF1, SF2, and SF3.Furthermore, prolonged times of water logging negatively impactplant roots. The particle size distribution and the amount of coarsefragments shape the soil’s pore space and hence determine waterpercolation to the groundwater. Together with the soil’s bufferingcapacity through soil mineralogy and organic compounds these

properties determine the soil’s filter capacity for contaminants(SF5). Last but not least, the soil’s storage capacity for TOC(SF6) in mineral soils is determined by the particle sizedistribution. In organic soils, it largely depends on the thicknessof the peat layer. Decomposition processes of the organic matterchange during prolonged periods of water logging. However, long-term stabilization of SOC depends on multiple aspects which shallnot be elaborated in the context of this study.

The variables available from the soil profile database were usedto approximate these soil characteristics. The variables particle sizedistribution, bulk density, stone content, TOC, and pH wereincluded. Furthermore, horizon symbols according to theGerman soil survey system were considered to a certain extent.The occurrence, depth and thickness of horizons with symbol H(peat horizon) was included to differentiate organic from mineralsoil horizons. The occurrence, depth and thickness of symbol S(stagnic horizon) were included to acknowledge zones of frequentwater logging. Likewise, symbol G (gleyic horizon) was included todefer to the zone of groundwater influence. The occurrence, depthand thickness of the C horizon were included to attribute to layerslittle affected by pedogenetic processes and, hence, the absence ofpedogenic oxides, organic matter, and soil structure. Likewise,symbol mC (a subcategory of C) was included to acknowledgedepth to bedrock (Boden, 2005). Though, it has to be mentionedthat it sometimes occurred in only part of the horizon.Additionally, EC was included to refer to soil salinity. TIC wasconsidered to differentiate soils originating from calcareous parentmaterial.

TABLE 2 | Computation of DEM-derived terrain parameters.

Abbreviation Variable Library Module Searchradii

DEM00 Elevation

SLO01, SLO05,SLO10

Slope Terrain analysis/morphometry

Morphometric features 1, 5, 10 cells

NOR01, NOR05,NOR10

Northness Morphometric features & Gridcalculator

1, 5, 10 cells

EAS01, EAS05, EAS10 Eastness 1, 5, 10 cellsTST01, TST05, TST10 Terrain surface texture Terrain Surface Texture 1, 5, 10 cellsTSR01, TSR05, TSR10 Terrain surface ruggedness Terrain Ruggedness Index 1, 5, 10 cellsCON01, CON05,CON10

Convergence Index Convergence Index (Search Radius) 1, 5, 10 cells

SLH00 Slope Height Relative Heights and Slope Positions 1 cellVAD00 Valley depth 1 cellNOH00 Normalised Height 1 cellWIN00 Wind Exposure Wind Effect 1 cell

NOP00 Negative openness Terrain analysis/Lighting,Visibility

Topographic Openness 1 cellPOP00 Positive openness 1 cell

VOF0S Vertical overland flowdistance (VOF)

xxx0M � majorrivers

Terrain analysis/Channels Overland Flow distance to ChannelNetwork

1 cellVOF0M

xxx0S � allsegments

1 cell

HOF0S Horizontal overland flowdistance (HOF)

1 cellHOFOM 1 cell

SWI00 SAGA wetness index Terrain analysis/Hydrology SAGA Wetness Index 1 cell

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 6929597

Ließ et al. Multivariate 3D Soil Parameter Space

Page 8: Machine learning with GA optimization to model the

SFTs were then derived by grouping soil profiles similar intheir properties. This was done in the form of a pair-wisecomparison by 1 cm depth slices using R-package “AQP”(Beaudette et al., 2013). Slice-wise depth weighting wasimplemented by an exponential decay function:

Wi � e−k·i (1)

The weight of each slice i is determined according to depthweighting coefficient k. Gower’s generalized dissimilaritymetric (Gower, 1971) was used since it accounts for anycombination of binary, categorical, or continuous variables.For the horizon symbol information, each depth slice wasassigned a 1 for the occurrence and a 0 for the non-occurrence.As each additionally considered soil property reduces theimpact of the other soil properties on the dissimilaritymetric, dissimilarity matrices with a varying number of soilproperties were calculated. All of the 50 variables sets includedthe particle size distribution. They differed 1) in the number ofparticle size separates (2, 3, or 7), 2) in the inclusion of horizoninformation (none| H, S, G and C | H, S, G, C, and mC), and 3)the additionally considered physical properties (stone content,bulk density), and 4) the considered chemical properties(TOC, TIC, pH, and EC). All variable sets were testedalongside with two soil depths (0–100 or 0–200 cm) andthree depth weighting coefficients (k � 0.00, 0.01, and 0.1).Increasing values of the latter indicate that topsoil informationwas assigned a higher weight compared to subsoil information.A value of 0 indicates that no depth weighting was applied, avalue of 0.1 indicates that between-profile differences weredominated by topsoil properties. To account for variable soildepth in the dissimilarity calculations, undefineddissimilarities were replaced by the maximum between-slicedissimilarity. Undefined dissimilarities were only preserved,when both depth slices represented non-soil material, i.e.bedrock. For each of the computed 300 dissimilaritymatrices (50 variable sets × 2 soil depths × 3 coefficients), acluster analysis with algorithm Ward (Ward, 1963) wasconducted to test cluster solutions with 2–50 clusters. Rpackage “NbClust” (Charrad et al., 2014) was used for thispurpose. The overall best cluster solution per dissimilaritymatrix was then selected according to the Silhouette Index(Kaufman and Rousseeuw, 1990). Of the, thereby, resulting300 cluster solutions only those with a reasonable high numberof clusters (≥8) were kept, assuming that solutions with veryfew clusters would be way too simple to represent thevariability of German soils under agricultural use.Furthermore, cluster solutions with <50 soil profiles in anyof their clusters were excluded.

The remaining cluster solutions were then compared accordingto their specification with respect to the soil properties. Among thesoil properties, a good definition with regards to particle sizedistribution, symbol H, symbol S, and symbol G was givenpriority over the other soil properties due to their importancefor all six soil functions. The final aim was to select one overall bestcluster solution. Each cluster of this best cluster solution wouldthen define an SFT with a multivariate distribution of soilproperties along its depth profile.

2.4 Modeling2.4.1 Model AlgorithmMachine learning algorithms are often applied for supervisedclassification problems. In this case, landscape positions wereclassified according to the presence or absence of a particular SFT.Each landscape position was described by an n-dimensionalvector of predictor values extracted from nationwide griddedgeodata, i.e. proxies of the soil-forming factors. The machinelearning algorithm was then used to build a model that learns onbehalf of a training dataset — landscape positions of knownpresence or absence — to then spatially apply the modelthroughout the respective landscape.

There is a high variety of machine learning algorithms appliedin pedometric modeling. And while random forest is becomingincreasingly popular (Padarian et al., 2020), due to its simplicityin structure and parameter tuning, the high potential ofalgorithms such as support vector machines (SVMs) is not yetwell exploited. Nonetheless, SVMs are a “hot topic” in the broadermachine learning community due to their high flexibility andpotential to perform complex learning tasks (e.g. Bennett andCampbell, 2000; Meyer, 2019). SVMs were developed by Cortesand Vapnik (1995). In binary classification tasks, they search forthe hyperplane that maximizes the margin between the twoclasses’ closest points. The properties of this decision surfaceensure the SVM’s high generalization ability (Cortes and Vapnik,1995). Points along the boundary are called support vectors. Thedata are projected to the higher dimensional space via kerneltechniques to allow for separation in case of nonlinearity. Theradial basis function (RBF) kernel is commonly applied for thispurpose. It helps to build complex decision boundaries andincludes two parameters: C and c. Their choice is crucial forobtaining good results. The c parameter can be interpreted as theinverse of the radius of influence of the support vectors. C is oftenreferred to as the cost or penalty parameter. With a small C, thepenalty for misclassified points is low; high values increase therisk of overfitting. Finally, it balances the misclassification oftraining samples against the simplicity of the hyperplane. Rpackage “e1071” provides the R interface to the LIBSVMlibrary for Support Vector Machines (Chang and Lin, 2011;Meyer, 2019).

2.4.2 Optimization ApproachThe search for optimal SVM parameters was conducted byoptimization employing a genetic algorithm (GA). Figure 2provides an overview of the procedure that was implementedwith R package “GA” (Scrucca, 2013; Scrucca, 2017). The GAs’operational structure is inspired by the general principles ofbiological evolution involving mutation, crossover, selection,and elitism. The parameter space to be searched for theoptimal combination of tuning parameter values has to bepredefined by providing a minimum and maximum value foreach parameter. Then, a random population of n vectors of SVMparameters is evaluated by a problem-specific fitness function.Weights are assigned to each individual of the population (eachvector) according to its fitness function value. Then “selection” ofpopulation individuals is done with a selection probabilityaccording to the assigned weight by sampling with

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 6929598

Ließ et al. Multivariate 3D Soil Parameter Space

Page 9: Machine learning with GA optimization to model the

replacement. “Elitism” allows the survival of the best individualsin case they were not selected. The resulting individuals from thereproducing population are then altered by two further geneticoperators: “mutation” and “crossover.”Mutation randomly altersindividual tuning parameter values. Crossover forms new vectorsfrom two existing vectors by combining values from both. Thisprocess is iterated until an initially defined fitness value isachieved by any of the vectors, until a maximum number ofiterations (maxiter) is reached, or until the fitness values do notimprove for a certain number of consecutive iterations (run).Please refer to Affenzeller et al. (2009) for further information ongenetic algorithms.

GAs can balance between the exploration of new areas of thegiven parameter space and the exploitation of good solutions.Exploitation (Exploitation 1, Figure 2) is usually controlled by thetwo operators, selection and elitism, whereas exploration(Exploration 1, Figure 2) is conducted by mutation andcrossover. The trade-off between exploitation and explorationis controlled by aspects such as the population size, and theprobability for mutation (pmutation) and crossover (pcrossover). Inthis particular case, we used a hybrid GA which was run inparallel. “Hybrid” means that the GA was combined with a localoptimizer. The latter extends exploitation (Exploitation 2,Figure 2) by starting a local search from one of the currentbest solutions after a predefined iteration interval (determined bypoptim). The approach was parallelized by subdividing the originalpopulation into several subpopulations, with each being assignedto an individual island. Each of these subpopulations was thenundergoing a separate optimization process. The islands are

connected by a ring topology allowing for a unidirectionalscarce exchange of individuals between the islands. Thisexchange is controlled by the migrationRate (proportion ofindividuals migrating) and the epoch, which defines thenumber of iterations i after which the migration takes place.The top individuals of a certain island, thereby, replace randomindividuals (excluding the elite ones) of the subsequent island.This approach is known as distributed multiple-population GAor island parallel GA (ISLPGA). Hybrid GAs can find a globalsolution more efficiently than conventional evolutionaryalgorithms. The ISLPGA introduces diversity into thesubpopulations, and is, thereby, extending exploration(Exploration 2, Figure 2), and preventing the search fromgetting stuck in local optima.

A couple of test runs were conducted to choose the GA settingsstarting from recommendations given by Scrucca (2017). Finally,GA search was conducted in the two-dimensional spaceconsidering the SVM parameters c and C to range between0.01 and 10. The population size was set to 125, and thenumber of islands for parallel search to 5. The interval for themigration between islands was set to 20, the migration rate to 0.1.Single-point crossover between parameter vectors was conductedwith a probability of 0.8, uniform random mutation with aprobability of 0.1. Linear rank selection was applied allowingfor the best five individuals to survive at each generation (elitism).The probability of applying local search was set to 0.1, and theselection pressure to 0.7. The overall maximum number ofiterations was set to 500, the number of consecutivegenerations without any improvement in the best fitness value

FIGURE 2 | Parallelized optimization with a hybrid genetic algorithm.

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 6929599

Ließ et al. Multivariate 3D Soil Parameter Space

Page 10: Machine learning with GA optimization to model the

before the GA was stopped was set to 20. Please refer to Scrucca(2017) and Scrucca (2013) for the details and available options.

2.4.3 Model Training and EvaluationFor SVM, the n-dimensional vectors of predictor data can onlyinclude real numbers. Accordingly, all categorical predictors wererecoded into dummy variables. All numerical data were scaled tothe range 0, 1 to avoid misbalance and numerical problems. Thepredictor-response dataset was then compiled by extracting thepredictor values at the soil profile sites and assigning each soilprofile to the respective SFT. SVMmodels were trained separatelyfor each SFT. Accordingly, the response data was coded intopresence-absence data. The misbalance between SFT occurrenceand absence was taking into account through the assignment ofweights during model building.

A nested approach of 5-fold stratified cross-validation (CV)was applied for model training, tuning and evaluation to obtainrobust models (Figure 3). For model evaluation, the outer CVcycle was repeated five times. Overall, in both CV cycles, theavailable predictor-response dataset (X-Y) was subdivided intofive folds of equal size using the response variable forstratification. Of these five folds, then always one fold waskept out as a test set while the other four were combined toform the model training set, leading to five separate test setevaluations (one per data instance). Each of the outer CV’straining sets was again subdivided to provide the datasets forparameter tuning in the inner CV cycle. Accordingly, the innerCV cycle reflects the fitness function for the GA optimizationprocedure. To combat computation time, the optimization wasconducted for only 1 out of 25 training sets. Predictions from thefive test sets were combined to compute model performance onbehalf of the confusion matrix as accuracy accounting forsensitivity and specificity, i.e. true positives and true negatives.The value ranges between 0 and 1 with a value of 0.8 indicatingthat 80% of the data instances were classified correctly.

The coding into dummy variables led to a dominance of thecategorical information (258 predictors) over the numerical (50)predictors. The original data sources differ in their degree ofreproducibility and the amount of included expert knowledge.This will affect the obtained modeling result and, therefore,requires careful consideration. The numerical predictors weremostly derived from measured data, i.e. the topographicalpredictors derived from digital terrain analysis (SCORPAN R:30 predictors), vegetation proxies derived from remote sensing(SCORPAN O: 12), and latitude, longitude (SCORPAN N: 2).The climate proxies are slightly different, for they wereinterpolated from point data on behalf of the DEM(SCORPAN C: 6). The categorical predictors were derivedfrom spatial vector information (polygons), i.e. SCORPAN P:195, R: 25, and S: 38. All of these data include expert knowledge.However, while GMK provides a classification on behalf ofnumerical data (DEM), the others have used geological andsoil data at point locations to derive map products with spatialmap units separated by strict boundaries drawn according toexpert judgment. The corresponding map products HÜK, BAG,and BGL are of high value, particularly due to their highinformation content with regards to SCORPAN P.Nonetheless, while using these data to train spatial predictionmodels, these map units’ boundaries are taken for granted sincetheir uncertainty is unknown. As the modeling approach isempirical, it heavily relies on the quality of the used data. Still,excluding data sources of unknown uncertainty is no optioneither as we would neglect valuable information. However, wemay consider balancing between numerical and categorical data.And there is yet another aspect to consider in model trainingconcerning the usage of categorical predictor data. Categories thatare not well represented in the data used for model training,tuning and evaluation, will hamper model performance. So finally,the number of dummy variables was reduced for two reasons: 1) toaddress the misbalance between categorical and numerical

FIGURE 3 | Resampling approach for model tuning and evaluation. (A) Nested 5-fold cross-validation with repeating the outer cycle. (B) Zoom in on 5-fold cross-validation.

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 69295910

Ließ et al. Multivariate 3D Soil Parameter Space

Page 11: Machine learning with GA optimization to model the

predictors, and 2) to build robust models, i.e. guarantee adequatepredictor representation in training, tuning, and test datasets. Datasubdivision according to the nested CVmade it a reasonable choiceto delete all categories with less than 100 occurrences. In this way,each data subset for testing in the inner CV cycle would still include16 instances on average. After data subdivision, it was verifiedwhether each data subset included at least 10 instances. Finally, 89predictors were included: 50 numerical and 39 categoricalpredictors. The following indicates the original number ofcategories per predictor (x/_/_), the number of categories afterextraction at point locations (_/x/_), and the remaining numberafter excluding all categories with non-sufficient instances (_/_/x):GMK 25/24/12, BAG 18/17/8, LIT 80/47/5, STR 97/73/7, and BGL38/36/7.

2.4.4 Model InterpretationIt is a common perception that machine learning models areblack-box models which hampers their interpretation. Whilethere are restrictions concerning the visualization of thecomplex model structure — particularly in the case of a highnumber of interacting predictors— there are, nonetheless, usefultools to understand the importance of individual predictors andtheir functional relation with the target variable. Variableimportance (VI) plots display the importance of individualpredictors. A general approach to compute them independentof the particular machine learning algorithm relies on predictorpermutation. Individual predictors are permuted in the test setbefore model application to eliminate any predictor-responserelationship present with regards to that predictor. The resultingrelative loss in model performance can then be attributed as VIvalue to the respective predictor. According to the five timesrepeated 5-fold CV approach (outer CV cycle), our VI plotsdisplay boxplots of 25 VI values for each predictor. Values of 5permutations were averaged.

Partial dependence plots (PDPs) are helpful to visualize therelationship between the predictors and the response. They are

low-dimensional graphical renderings of the prediction functionaccounting for the average effect of the other predictors in themodel (Friedman, 2001; Greenwell, 2017). But they may bemisleading in the case of strong predictor interaction. Anapproach to address this issue are individual conditionalexpectation (ICE) plots (Goldstein et al., 2015). They displaythe estimated relationship between a selected predictor and theresponse for each observation. The PDP can then be obtained byaveraging the corresponding ICE curves across all observations.PDPs according to the latter approach were computed with Rpackage “pdp” (Greenwell, 2018). The five times repeated 5-foldCV approach resulted in 25 PDP realisations per SFT model. Themedian of these 25 realisations was used to analyze the functionalrelationships.

3 RESULTS AND DISCUSSION

3.1 Comparison Between Cluster SolutionsThe original 300 best cluster solutions (best solution perdissimilarity matrix) were reduced to 22 solutions due to thecriteria 1) number of clusters ≥8 and 2) minimum number ofprofiles ≥50 in any of their clusters. The remaining clustersolutions (Supplementary Table S1), were sorted according toa decreasing number of clusters (12–8) and a decreasing numberof considered input variables (14–7), and, hence, from higher tolower complexity. From these 22 solutions, only three correspondto a soil depth of 0–200 cm, the others to a soil depth of 0–100 cm.Most of the solutions (16 out of 22) were derived while no depth-weighting was applied. None of the solutions was obtained with adepth weighting coefficient of 0.1, which would indicate adominance of topsoil information for the calculation of thebetween-profile dissimilarity. This gives a first hint on theimportance of subsoil dissimilarity when it comes toagricultural soils, whose topsoils are strongly managed to serveagricultural production, and are, hence, less diverse. Concerning

FIGURE 4 | 11-digits barcode assignment scheme to allow for the comparison of cluster solutions.

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 69295911

Ließ et al. Multivariate 3D Soil Parameter Space

Page 12: Machine learning with GA optimization to model the

the considered input variables, the picture is more diverse. Eachof the considered soil properties was included in a couple ofcluster solutions. Though, it has to be mentioned that particle sizedistribution with a number of 2, 3, or 7 particle size classes hadbeen included in all 300 dissimilarity matrices due to its highimportance for soil functionality.

In General, each cluster solution defines its clusters differentlyby grouping soil profiles according to their similarity on behalf ofa different set of variables. Accordingly, cluster IDs do not match.To still allow for their comparison with regards to the definitionof their multivariate distributions along the depth profile,attributes were assigned to each of the clusters, reflecting soil

FIGURE 5 | Number of soil profiles per cluster solution and barcode. (A) S1, (B) S2, (C) S3, (D) S4, (E) S5.

FIGURE 6 | Boxplots of predictive model performance corresponding to the eight clusters similar among the five solutions S1–S5. (A) organic, (B) skeletic, (C)sandy, (D) sandy-gleyic, (E) sandy-stagnic, (F) silty, (G) silty-gleyic, and (H) silty-stagnic.

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 69295912

Ließ et al. Multivariate 3D Soil Parameter Space

Page 13: Machine learning with GA optimization to model the

traits. These attributes were assigned on behalf of each cluster’smean of the median depth profile for the respective soil property.An 11-digits barcode was then assigned to each cluster accordingto the scheme depicted in Figure 4. The 22 cluster solutions werefirst grouped according to their number of clusters. Then, fromeach group, the cluster solution was kept that would bestdifferentiate its clusters according to the multivariatedistributions along the depth profile. Exceptionally, the twocluster solutions with 12 clusters were both kept as there wasno clear prevalence of one over the other. Finally, 5 solutionsremained, labeled S1–S5 as indicated in Supplementary TableS1. These remaining five solutions consider a rather differingnumber of clusters 8–12 and a similar number of variables 10–11.Still, the variables themselves also differ. They all comprise thesame basic set of the following variables: particle size distribution(three classes), stone content, and the information on theoccurrence, depth, and thickness of the horizons H, S, G, andC. Information concerning the mC horizon was included in allbut S3. S4 was the only solution that additionally includes bulkdensity and no further properties. S1, S2, S3, and S5 differ in theadditional chemical properties they consider.

Figure 5 provides a summary indicating the number ofprofiles per solution and barcode. Similar colors reflect similarsoils. Clusters with organic soils are displayed in dark browncolors, clusters of soils with a high stone content with red colors,clusters with gleyic soils by blue colors, clusters with stagnic soilsby turquoise colors, clusters with sandy soils in yellow, and

clusters with silty soils in light brown. Please be aware that thetwo to three shades for the clusters with gleyic and stagnic soilsalso reflect a difference in their particle size distribution. Whilethere is a high percentage of soils defined by a sandy or siltytexture and soils being influenced by ground water or displayingstagnic properties, there are much fewer profiles attributed to theclusters which are predominantly organic, with a high stonecontent, or with a depth limitation in the top 100 cm.

Finally, due to the different definition of clusters, i.e. SFTs, it wasexpected that the SFTs of certain cluster solutions could be betterrelated to the soil-forming factors than those of others, and would,therefore, result in a better performance of the models for theirspatial prediction. Accordingly, the best overall model performancewould then determine the final definition of SFTs. SVMmodels weretrained for all five cluster solutions. Comparison with regards tomodel performance between the solutions was based on those eightclusters similar among them. As all of the solutions only include onecluster with the first barcode digit corresponding to organic, theseclusters’ spatial models were compared. The same applies for onecluster with a high stone content (bar code digit “skeletic” � 3 ifavailable, otherwise � 2), three clusters with sandy properties(barcode digit “sandy” � 1), and three clusters with acomparatively high silt content (barcode digit “silty” � 1). Of thethree sandy and silty clusters, there was one cluster with additionalgleyic properties and one with additional stagnic properties,respectively. For S1 and S3, from the two clusters with gleyicproperties, one fulfilled the criterion of sandy, the other had a

FIGURE 7 |Multivariate soil parameter distribution along the depth profiles of the SFTs, part 1: horizon occurrence probability. (A) organic horizon (symbol H), (B)horizon with stagnic properties (symbol S), (C) groundwater influence (symbol G), (D)C horizon, and (E) depth limitation by bedrock (symbol mC). Percentages along theright figure margin indicate the contributing fraction of soil profiles per 20 cm depth increment.

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 69295913

Ließ et al. Multivariate 3D Soil Parameter Space

Page 14: Machine learning with GA optimization to model the

somewhat too little silt content to classify as silty. Both were stillincluded in this comparison.

Figure 6 displays the predictive model performance of the eightclusters similar in all five cluster solutions. The boxplots reflect 5values (repeated outer CV cycle). S4 achieved the best predictivemodel performance. It had the best median accuracy for four out ofeight, and the highest average accuracy among the eight consideredclusters. S3 was superior in model performance concerning the threeclusters corresponding to organic, sandy, and sandy-gleyic soils(Figures 6A, C, D). The slightly different assignment of soilprofiles to these clusters led to a better spatial prediction. Overall,there is very little difference in model performance for certainclusters, namely those with organic, sandy-gleyic, and sandy-stagnic soils (Figures 6A, D, E), likely due to a large overlap insoil profile assignment.

3.2 Soil Functional Types With MultivariateSoil Parameter Distributions Along theDepth ProfileS4 is the cluster solution that considered the particle sizedistribution (three classes), the stone content, the bulk density,

and the occurrence, depth and thickness of the horizons H, S, G,C, and mC to compute soil profile similarity to differentiate theSFTs (Supplementary Table S1). Figures 7, 8 display thecorresponding multivariate distribution along the depth profile(0–100 cm) of the nine SFTs corresponding to S4. Soil profile datawere aggregated per 1 cm depth slice and displayed as horizonoccurrence probability (Figure 7), and median values with theinterquartile range in the case of continuous data (Figure 8).Figure 5D indicates the barcodes corresponding to the respectiveSFTs. For simplicity, these barcodes were now renamed to refer tothe SFTs more easily. The following naming convention wasapplied. SFTs with similar properties were assigned to the samemain identifier: this applies for two SFTs with stone contents≥10% and ≥30% (SFT2.1 and 2.2), three SFTs with sandyproperties (SFT3.1, SFT3.2, and SFT3.3), and three SFTs withsilty properties (SFT4.1, SFT4.2, and SFT4.3). Of the sandy andsilty soils, there is one SFT with additional gleyic and one withadditional stagnic properties each. Altogether, this results in thefollowing naming convention: SFT1 (organic), SFT2.1 (skeletic),SFT2.2 (skeletic with depth limitation), SFT3.1 (sandy), SFT3.2(sandy-gleyic), SFT3.3 (sandy-stagnic), SFT4.1 (silty), SFT4.2(silty-gleyic), and SFT4.3 (silty-stagnic).

FIGURE 8 | Multivariate soil parameter distribution along the depth profile of the SFTs, part 2: median of soil properties with interquartile range. (A) particle sizedistribution, displayed as sand, silt, and clay content [mass-%], (B) stone content [vol-%], and (C) bulk density [g cm−3]. Percentages along the depth profiles indicate thecontributing fraction of soil profiles per 20 cm depth increment.

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 69295914

Ließ et al. Multivariate 3D Soil Parameter Space

Page 15: Machine learning with GA optimization to model the

Figure 7 reflects the occurrence probability of the horizons H,S, G, C, and mC from top to bottom. Figure 7A indicates thatsoils that include an organic horizon in some part throughouttheir profile (SFT1) were well separated from soils that do notinclude an organic horizon. However, SFT1 still includes a highvariety of soils: Soils that are organic throughout the top 100 cmwere combined with soils that include mineral soil horizons intheir top 20 cm (10%) and/or below c. 50 cm (10–30%).Furthermore, 10–28% of the soils have groundwater influencein some part of their profile (Figure 7C). For the mineral soils,most soils with groundwater influence (SFT3.2 and SFT4.2) werewell separated from soils with no groundwater influence. Thesame applies to soils with stagnic properties in the top 100 cm(SFT3.3 and SFT4.3, Figure 7B). SFT2.1 and SFT2.2 includemany soils with a C horizon starting at low soil depths indicatingrather initial stages of soil development (Figure 7D). 20% of thesesoils have a C horizon starting at 20 cm (SFT2.2) or 25 cm(SFT2.1) soil depth. Further down the pedon, this percentageaugments to 80 and 75%, respectively. The decrease in C horizonprobability below these depths is because many soil profiles inthese SFTs have a soil depth <100 cm as indicated by thedecreasing contributing fraction: Only 49% of the profiles ofSFT2.1 are deeper than 80 cm, SFT2.2 soil depth does not reachbelow 80 cm at all. SFT2.1 and SFT2.2 are also the two SFTs withprofiles including bedrock (Figure 7E). From the SFTscorresponding to sandy soils (SFT3.1, SFT3.2, and SFT3.3), itis the exclusively sandy SFT3.1 which includes a rather highpercentage of profiles with C horizon occurrence within the top100 cm: 40% at 52 cm depth augmenting to 70% at 72 cm depth.The same applies to the silty SFT (SFT4.1). Although, the silty-stagnic soils (SFT4.3) also partly include a C horizon within theirtop 100 cm (Figure 7D).

Figure 8 displays the second part of the multivariate soilparameter distribution along the depth profile of the SFTs,corresponding to measured soil properties. This includesparticle size distribution (Figure 8A), stone content(Figure 8B), and bulk density (Figure 8C). The multivariatedistributions along the depth profile of each SFT are representedby the following quantiles: Q5, Q25, Q50, Q75, and Q95 (Ließ,2021). SFT1 is described by sandy soil material in the top 20 cmand below 80 cm depth. In between, most soils have organichorizons as observed from H horizon probability (Figure 7A)and the contributing fraction (Figure 8A). As expected from thebarcodes and reflected in the naming convention, SFT3.1,SFT3.2, and SFT3.3 have a particle size distributiondominated by the sand content, whereas SFT4.1, SFT4.2, andSFT4.3 have a particle size distribution dominated by the siltcontent. Among the former, SFT3.2 displays the highest mediansand content throughout the top 100 cm. Among the latter,SFT4.1 has the lowest sand and highest silt content. Thedifference in texture between SFT2.1 and 2.2 is not sopronounced. Still, SFT2.2 differs from SFT2.1 by itsincreasing sand and decreasing clay content with depth. Thestone content of SFT2.1 and 2.2, the two skeletic SFTs, is muchhigher as compared to the other SFTs (Figure 8B). In both SFTsthe stone content is increasing with depth, reaching itsmaximum somewhere around 60 cm. Figure 8C displays the

SFT’s distribution of the depth profile with regards to bulkdensity. As expected, the organic SFT1 has much lower valuescompared to the others. The rather high interquartile rangebelow 70 cm corresponds to the high amount of soils withmineral horizons starting at this depth. The other SFTs’distributions along the depth profile display a clear steparound 10 cm depth reflecting the loosely settled soil aftertillage operations by cultivators (croplands) and/or the crumpstructure caused by an active soil fauna on grassland or non-tilled soils. SFT2.2 differs from the other mineral SFTs due to aslightly lower bulk density with a higher interquartile rangethroughout its depth and a step of decreasing bulk densityaround 70 cm depth. Further soil properties were notincluded in the multivariate distributions along the depthprofiles of the SFTs as they were not part of the variable setto compute the dissimilarity matrix of S4.

Altogether, organic soils were differentiated frommineral soils ofvarious particle size distributions partly influenced by waterloggingand groundwater within their top 100 cm. Further SFTs reflect soilswith a depth limitation and a high stone content. In the evaluationof the various cluster solutions, higher importance was given toparticle size distribution and characteristics distinguishing organichorizons, horizons of water logging and horizons of groundwaterinfluence due to their uttermost importance for all previously listedsoil functions. As a consequence, the SFTs of the selected solution S4do not differentiate well concerning the soil properties TOC andEC. Accordingly, we decided to reduce the parameter space of theSFTs to those soil properties included in the computation of thedissimilarity matrix of S4. Organic soils differ from mineral soils intheir pedogenesis, composition and structure. Soil particle sizedistribution largely determines the available water capacity in theroot zone and, hence, the soil’s capability to cope with drought andprevent yield losses. Furthermore, it determines the soil’s storagepotential for SOC. Accordingly, it largely determines soil fertilityand the soils’ production function. Periodic water logging in theroot zone affects the soil microbial community, decompositionprocesses, and nutrient turnover. The closeness of the soil horizonwith changing groundwater level determines — after fertilizerapplication and percolation through the profile — how muchnitrate increases groundwater contamination. And while theproperties that are currently included in the definition of theSFTs’ multivariate distributions along the depth profile arereflecting the most important properties, we are well aware thatprocesses such as nutrient cycling and filtering of contaminantsrequire the inclusion of further soil properties particularlystabilizing and buffering agents related to soil mineralogy andcation exchange capacity. Currently, only C horizon occurrence,thickness and depth and the assumed absence of pedogenic oxides,structure and TOC give some hint to approximate this aspect. Andwhile other soil properties such as pH and TOC are manipulatedthrough fertilization and liming, the soils have different states oforigin with regards to these properties which should also show asthey provide important information for agricultural managementplanning and intitial values for agricultural process modeling. Still,the current definition of soil functional types provides a decent basisto represent soil functionality with regards to the storage of plantavailable water (SF3), and the soil’s storage capacity for TOC

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 69295915

Ließ et al. Multivariate 3D Soil Parameter Space

Page 16: Machine learning with GA optimization to model the

(SF6). The other four soil functions are represented to a certainextent. Vogel et al. (2019) argue that it is the inherent soilproperties (properties that do not change within decades) that

allow for the evaluation of the soil’s potential to fulfill the soilfunctions. Whereas it is those properties changed underagricultural management that allow for the evaluation of the

FIGURE 9 | Predicted median occurrence probability of the SFTs at national scale. (A) SFT1, (B) SFT2.1, (C) SFT2.2, (D) SFT3.1, (E) SFT3.2, (F) SFT3.3, (G)SFT4.1, (H) SFT4.2, (I) SFT4.3. Non-agricultural areas were masked out on behalf of the CORINE Land Cover Inventory 2018 (Figure 1A). The corresponding predictivemodel accuracy is indicated in the bottom right corner.

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 69295916

Ließ et al. Multivariate 3D Soil Parameter Space

Page 17: Machine learning with GA optimization to model the

current state and fulfilment of the soil functions with regards tothese potentials.

Pedon similarity with regards to the multivariate soilprofile information has previously been considered in thecontext of numerical soil classification as opposed toconceptual soil classification (e.g. Rayner, 1966; Mooreet al., 1972). However, to our knowledge, it has not beenused to derive soil functional types for agriculturallandscapes. The data-driven approach we follow may furtherbenefit by considering property-specific depth weighting todifferentiate between soil properties affected by agriculturalmanagement and those that don’t. The consideration of soilproperties such as TOC, pH, EC and bulk density to defineSFTs in agricultural landscapes is tricky as they are heavilymanipulated by fertilizer application, liming, and tillageoperations. Nonetheless, past agricultural management mayalso have impacted the occurrence, depth and thickness ofmineral and organic soil horizons. Particularly in northwestGermany, nutrient-poor, sandy topsoil was often improved bymixing it with grass or heather plagues and, thereby, alteringtheir particle size distribution and SOC content. This praxiswas still common at the beginning of the 20th century butstopped with upcoming mineral fertilisers. In the same area,

the deep ploughing of peatlands and Podsols for ameliorationuntil a depth of 1.50 m took place. The result was a mixture oforganic with mineral — mostly sandy — underlying soilmaterial, or alternating inclined layers of mineral andorganic soil material.

3.3 Spatial Prediction, Model Interpretation,and EvaluationFigure 9 shows the median occurrence probability of the SFTsthroughout Germany. Median predictive model performance isindicated by the respective accuracy in the bottom right cornerof each map. Please be aware that Figure 8B has a different maplegend due to the predicted low probabilities. Median predictivemodel performance decreases in the following order: SFT1 –SFT4.2 – SFT3.2 – SFT2.2 – SFT4.1 – SFT3.3 – SFT4.3 – SFT3.1– SFT2.1 from 0.87 to 0.67 and the exceptional low predictiveperformance of 0.54 for SFT2.1. Model performance wasparticularly good for SFT1 (organic) and the two SFTsrepresenting gleyic soils (SFT3.2, SFT4.2). Sandy-gleyic(SFT3.2) and sandy-stagnic (SFT3.3) soils could be betterpredicted than sandy soils without gleyic or stagnicproperties (SFT3.1). The same applies to silty-gleyic (SFT4.2)

FIGURE 10 | VI plots. The Subfigures display the VI values for the individual models to predict the occurrence probability of the SFTs. The boxplots reflect thevariance in VI due to the CV approach. (A) SFT1, (B) SFT2.1, (C) SFT2.2, (D) SFT3.1, (E) SFT3.2, (F) SFT3.3, (G) SFT4.1, (H) SFT4.2, and (I) SFT4.3.

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 69295917

Ließ et al. Multivariate 3D Soil Parameter Space

Page 18: Machine learning with GA optimization to model the

as compared to silty soils (SFT4.1), but not for silty-stagnic soils(SFT4.3).

Figure 10 shows the variable importance (VI) for the ninemodels corresponding to S4. High relative VI values thatamount to a multiple of 100% display a high degree ofpredictor interaction within the models. Most dummypredictors have values above 30 or even 40%. Somepredictors have very low VI values close to zero. Still, theexclusion of predictors below VI values of 2 led to a reducedmodel performance (results not shown). In the first predictorgroup (SORPAN C), the drought index of the winter andsummer months was of low importance for all SFTs,probably due to its calculation from temperature andprecipitation, which were also included as predictors. Still,the exclusion of highly correlated predictors led to adecreased model performance (results not shown).Temperature (TEMSU, TEMWI) and precipitation (PREWI,PRESU) of the summer and winter months have VI values above10% for all SFTs. For most SFTs they are around 20%. For SFT1(Figure 10A) and SFT2.1 (Figure 10B), they are above 30 oreven 40%. Among the SCORPAN O proxies, there is an increasein VI values in the order DMP – NDR – NDV – VPI, with VPIbeing of comparatively much higher importance for all SFTs.The selected time slot in June displays the percentile ranking ofthe current NDVI against its historical range of variability andis, therefore, a good indicator of comparatively dry (2018) or wet

(2016) conditions. Vegetation on sandy soils will likely havebeen more impacted by drought than vegetation on silty soils,although the opposite may apply in case the soils are covered bya narrow sandy topsoil horizon. The NDVI and NDRE values ofyearly composites of single years will not show this effect asstrongly, even though the spatial resolution of the data is higher.Again the vegetation indices (particularly VPI) are ofcomparatively higher importance for SFT1 and SFT2.1. Thismight be due to a prevailing usage as grasslands of the soilsassigned to these SFTs. A similar effect is observable in the nextpredictor group concerning these two SFTs (R1). For all SFTs,elevation (DEM00) and easting (EAS) have rather lowimportance. The low importance of elevation is not thatsurprising as it is often included in pedometric models toreflect the climatic gradient, which is in this case alreadycaptured by predictor group C. The remaining predictors ofthe group have particularly high importance for SFT1 andSFT2.1, but even for the other SFTs, they reach VI valuesabove 10 or 20%. Among the hydrological terrain parameters(R2), VOF0S and VOF0M display the highest importancefollowed by SWI00. The high information content containedin the geomorphographic map units (R3) displayed byparticularly high VI values of 30–50%, likely reduces theimportance of the numerical predictors (R1, R2). Theparticularly high importance of the dummy predictors is alsoobservable for predictor groups P and S. Latitude and longitude

FIGURE 11 |Median PDPs of selected continuous predictors. (A) PRESU, (B) TEMSU, (C) NDV18, (D)NDV86, (E)NOP00, (F) POP00, (G) SLH00, (H) SLO10, (I)TST10, (J) TRI10, (K) VAD00, (L)WIN00, (M) SWI00, (N) VOF0M, and (O) VOF0S. Please refer to Tables 1,2 for the predictor abbreviations. The predictor values alongthe X-axis correspond to the values extracted from the normalized nationwide gridded geodata.

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 69295918

Ließ et al. Multivariate 3D Soil Parameter Space

Page 19: Machine learning with GA optimization to model the

display VI values around 10% for SFT2.1 indicating that thereare likely some spatial patterns not captured by the otherpredictors. Consequently, a less restrictive approach in theconsideration of the categorical predictors could evenimprove model performance. Though the benefit of includingadditional predictors may differ among the SFTs. Thetransmission of polygon map boundaries to the pedometricmodeling result is a well-known problem particularly observedwhile applying recursive partitioning algorithms (e.g. Behrenset al., 2018b; Nussbaum et al., 2018).

The spatial patterns of the sandy SFT3.1, the sandy-gleyicSFT3.2, and the sandy-stagnic SFT3.3 (Figures 9D–F), displaytheir development from sandy parent material (yellow color inFigure 1F). Soils of the sandy SFT3.1 additionally developed fromsandstones, acid magmatites and metamorphites in the mountainsof the Central German Upland region and loose material in theAlpine Foreland. Likewise, the spatial patterns of the silty SFT4.1(Figure 9G) and the silty-stagnic SFT4.3 (Figure 9I) display theirdevelopment from loess (light orange color, Figure 1F). On theother hand, soils that developed from clay stones (mud color,Figure 1F) and are, therefore, likely to have a clayey texture, areseldom used for agriculture (Figure 1A), which explains the lack ofa corresponding SFT. The spatial prediction of the silty-gleyicSFT4.2 (Figure 9H) is clearly showing the strong influence of

certain dummy predictors derived from the BAG (Figure 1F). Thesoil of the silty-gleyic SFT4.2 developed from floodplain sedimentsand sediments in the tidal range. Their distinction is also visiblefrom terrain morphology (Figure 1E). The spatial patterns ofSFT2.1 (Figure 9B) and SFT2.2 (Figure 9C) are similar. Theymainly occur in the mountains of the Central German Uplands(Figure 1B), which explains their high stone contents.

The direct influence of selected numerical predictors on therespective SFTs is displayed in the PDPs in Figure 11. The shownprobability values for the individual SFTs are rather low due tothe averaging over all ICE plots. The plots are probably good tosee general trends but not in reflecting a comparison concerningthese average probabilities between SFTs. Still, it is interesting tosee that the two skeletic SFTs SFT2.1 and SFT2.2 show verysimilar trends with SFT2.2 always displaying higher probabilities.Both SFTs are likely assigned to the same raster cells and canprobably not be well separated in space. The two gleyic SFTs(SFT3.2 and SFT4.2) and the two stagnic SFTs (SFT3.3 andSFT4.3) show a remarkable similarity in sudden changes inthe slope for a couple of SCORPAN R proxies such asNOP00, SLH00, SLO10, TRI10, VOF0M, and VOF0S. Havingdeveloped from different parent material, these soils likely occurin similar topographical landscape positions. Though, not alltrends can be well explained. Overall, the predictors are proxies

FIGURE 12 | Map displaying SFT distribution.

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 69295919

Ließ et al. Multivariate 3D Soil Parameter Space

Page 20: Machine learning with GA optimization to model the

describing today’s landscape while the changing interaction of thesoil-forming factors throughout time is not captured by theavailable data.

Model results concerning the predicted SFT probability werecombined to generate the final spatial allocation of the SFTs(Figure 12) in the following way: For each SFT, the medianprobability of the 25 predicted values (CV approach) wascalculated per raster cell (Figure 8). Then for each raster cell,the SFT with the highest median occurrence probability wasassigned (Figure 11). Due to its low probabilities (Figure 8B),SFT2.1 was not assigned to any raster cell, as we also assumed onbehalf of the PDPs. As can be seen from the corresponding modelaccuracy of 0.54, this was also the SFT whose spatial distributioncould not be well predicted.

Together the SFT spatial allocation (Figure 12) and the SFT-specific multivariate soil parameter distributions along the depthprofiles provide a representation of the multivariate 3D soilparameter space of the agricultural soil-landscape (Germany)which can be used to evaluate certain soil functions, and as inputdata to agricultural process models. While being operated atnational scale, the latter require a representation of thepedosphere with regards to its spatially varying properties. Thepedosphere constitutes a spatial continuum. Its properties vary inthe magnitude of micrometres, centimetres or meters. But, at thishigh spatial resolution, the provision of soil information atnational scale would result in inconceivable amounts ofmultivariate data in three dimensions. However, this is hardlynecessary to answer current agricultural challenges. Themodeling of the impact of drought on agricultural yield, thestorage of soil organic carbon, or excess fertilizer percolation tothe groundwater, would merely require a spatial resolution torepresent individual agricultural fields. And while we maintainthis spatial resolution, we may, still, agglomerate soils accordingto their similarity in their properties with the ultimate fin to

provide a limited set of spatially assigned process units ofspecified characteristics to run agricultural process models,and, thereby, reduce the required computing power. Forexample, the agricultural soil-landscape of Germany at aspatial resolution of 100 m augments to 19.5 million rastercells. And this amount is multiplied while we acknowledge thepedon behind each raster cell by its multivariate parameterdistribution along the depth profile, and the correspondingsite-, property-, and depth-specific uncertainty resulting frompedometric modeling. Process models considering this highamount of data would likely run into problems concerning therequired computing power.

To spatially represent the soil continuum in maps, soilscientists have since long started to agglomerate soils into SUs.It is these SUs that may then depict SMUs in soil maps of a largemap scale. At small map scales, the SUs are often combined withother SUs to form larger SMUs still visible at that scale. However,the SUs defined in soil classification systems have the mainpurpose to facilitate communication about the complex systemof soil. They are nowadays often reflecting pedogenesis, and are,therefore, not necessarily well suited to represent soilfunctionality (Mueller et al., 2010). In many soil classificationsystems, important soil properties guiding soil functionality areonly distinguished at a low systematic level and rather similarsoils concerning their properties and functionality are assigned todifferent upper-level SUs. This partly also hampers their spatialdifferentiation in pedometric modeling. Promising to derivespatial units to represent soil functionality are also approachesthat spatially predict characteristic soil horizons relating tostagnic properties, or organic carbon accumulation and soildepth instead of SUs (e.g. Gessler et al., 1995; Gastaldi et al.,2012; Ließ et al., 2012). Overall, the domain of application-oriented pedometric mapping follows the same rationale: Ließet al. (2011) and Ließ and Huwe (2012) estimate natural landslide

FIGURE 13 | Soil property-wise model performance for selected soil depths. Boxplots of 5 RMSE values per soil property and depth. (A) sand content, (B) siltcontent, (C) clay content, (D) stone content, and (E) bulk density. The horizontal line below each boxplot refers to the RMSE of the original SFT assignment of the soilprofiles. Concerning the first line of figures, residuals for the RMSE calculation refer to the difference to the SFTs’ 5–95% interquantile range for the respective property ofthe multivariate distribution of the depth profile. Concerning the second line of figures, the interquartile range was considered for computing the RMSE.

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 69295920

Ließ et al. Multivariate 3D Soil Parameter Space

Page 21: Machine learning with GA optimization to model the

risk in a tropical mountain environment. Jeong et al. (2017) assessland potentials in forest soils under monsoon climate. Greineret al. (2018) thematise uncertainty indication in soilfunction maps.

To adequately simulate and evaluate the impact of climatechange on crop production, it is important to assess the impactof local to regional soil variability with regards to soil waterconditions at critical crop development states (e.g. Challinoret al., 2009; White et al., 2011; Kasampalis et al., 2018).Accordingly, approaches that estimate the impact ofdrought on agricultural yields throughout Germany couldlargely benefit from the developed data product. Currently,they rely on the initially described soil maps with spatially-unresolved SMUs and can, therefore, not provide site-specificpredictions (e.g. Zink et al., 2016; Webber et al., 2020). Besidesknowledge on nitrogen surplus on soils under agricultural useand lateral transport in aquifers, studies to investigate nitratepathways to the groundwater would also clearly benefit fromcontinuous, site-specific soil information (e.g. Knoll et al.,2020).

The soil-property-wise evaluation of the final product for selecteddepths (5, 20, 40, 60, and 80 cm) shows a good predictiveperformance (Figure 13). The increase of the RMSE with depthcorresponding to the original SFT assignment while considering theinterquantile ranges (Q5–Q95 and Q25–Q75) of the respectivedistribution along the depth profile, indicates that the propertiesof the subsoil need to be weighted more heavily in the similarityanalysis to identify the SFTs. For the RMSE values of the modelpredictions (boxplots), this increase with depth is even stronger. Thisis because the SFTs differ more in their subsoil compared to theirtopsoil. Correspondingly, misclassifications show a stronger effect.Overall, the good model performance indicates successful modeltraining and tuning due to the implemented SVMwith optimization.The corresponding RMSE values of predictive model performancefor topsoil texture from the European (Ballabio et al., 2016) andglobal (Hengl et al., 2017) predictions, evaluated on behalf of thesame test data sets, are 17.3 and 18.8% for sand, 13.6 and 15.3% forsilt, and 9.1 and 9.5% for clay.

4 CONCLUSION

The presented approach to derive a realisation of the multivariate3D soil parameter space of the agricultural soil-landscape(Germany) consists of 1) the differentiation of SFTs withmultivariate distributions along their depth profiles, and 2) theprojection of these SFTs to continuous space by machinelearning. It simplifies the agricultural soil-landscape byagglomerating similar soils according to their properties. Thishas two benefits: On the one hand, it well addresses the generalconcept behind pedometric modeling at the landscape scale,which seeks to explain similarity in soils by similarity inlandscape characteristics related to the soil-forming factors. Inthis sense, it provides a valuable approach to model multiple soilproperties with their respective 3D distribution simultaneously.On the other hand, the obtained data product reduces themultivariate complexity of the spatial soil continuum to a

limited number of spatial process units to run agriculturalprocess models at national scale. The required computingpower is reduced to a large extent.

Overall, the current product published alongside thismanuscript (Ließ, 2021) has to be understood as a firstversion in the iterative process of pedometric modeling. Thecurrent definition of SFTs provides a decent basis to representsoil functionality with regards to the storage of plant availablewater (SF3), and the soil’s storage capacity for TOC (SF6). Theother four soil functions are represented to a certain extent.Perspectively, it shall be further enhanced by extending themultivariate distributions along the depth profiles by furthersoil properties and adapting the definition of the SFTs throughthe implementation of property-specific depth weighting. Thiswill likely increase the number of SFTs and reduce therespective ranges of their multivariate distribution along thedepth profiles. Moreover, any pedometric model can be furtherimproved by including an ever-increasing amount, andinformation content of input data, and by enhancing themodeling approach to derive an ever-increasing predictiveaccuracy. We are positive that the implemented machinelearning approach resulted in robust models that extractedthe highest possible information content concerning thelandscape allocation of SFTs due to the implementedresampling and SVM with optimization. A furtherimprovement would, therefore, mainly rely on additional soilprofile data, and the inclusion of further gridded geodata toapproximate the soil-forming factors. And the latter will neverbe sufficient as we strive to model the result of 10,000 years ofpedogenesis by data covering the last few decades.

DATA AVAILABILITY STATEMENT

The dataset of the multivariate 3D soil parameter space isavailable from DOI 10.17605/OSF.IO/GQBMD (Ließ, 2021).

AUTHOR CONTRIBUTIONS

Conceptual approach, programming, modeling, scientificembedding, and preparation of figures (ML), predictorpreparation (ML and AG), manuscript writing (ML, AD, and AG).

ACKNOWLEDGMENTS

This work is part of the SoilSpace3D-DE project and contributesto the BonaRes Centre — Soil as a Sustainable Resource for theBioeconomy — modeling framework.

SUPPLEMENTARY MATERIAL

The SupplementaryMaterial for this article can be found online at:https://www.frontiersin.org/articles/10.3389/fenvs.2021.692959/full#supplementary-material

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 69295921

Ließ et al. Multivariate 3D Soil Parameter Space

Page 22: Machine learning with GA optimization to model the

REFERENCES

Adler, G., Behrens, J., Eckelmann,W., Hartwich, R., and Richter, A. (2003). “Bödenim Überblick,” in ifL: Nationalatlas Bundesrepublik Deutschland, Band 2 –

Natur und Umwelt I: Relief, Boden und Wasser (Leipzig: Leibniz-Institut fürLänderkunde), 100–103.

Affenzeller, M., Winkler, S., Wagner, S., and Beham, A. (2009). Genetic Algorithmsand Genetic Programming. Boca Raton, FL: Taylor and Francis Group.

Alexander, J. (2003). “Die heißesten und kältesten Gebiete,” in ifL: NationalatlasBundesrepublik Deutschland, Band 3 - Natur und Umwelt II: Klima, Pflanzen-und Tierwelt (Leipzig: Leibniz-Institut für Länderkunde), 36–37.

Ardakani, A., and Kordnaeij, A. (2017). Soil Compaction Parameters PredictionUsing GMDH-Type Neural Network and Genetic Algorithm. Eur. J. Environ.Civ. Eng. 23, 449–462. doi:10.1080/19648189.2017.1304269

Asch, K., Lahner, L., and Zitzmann, A. (2003). “Die Geologie von Deutschland –ein Flickenteppich,” in ifL: Nationalatlas Bundesrepublik Deutschland - Relief,Boden und Wasser (Leipzig: Leibniz-Institut für Länderkunde), 32–35.

Ballabio, C., Panagos, P., and Monatanarella, L. (2016). Mapping Topsoil PhysicalProperties at European Scale Using the LUCAS Database. Geoderma 261,110–123. doi:10.1016/j.geoderma.2015.07.006

Barnes, E. M., Clarke, T. R., and Richards, S. E. (2000). “Coincident Detection ofCrop Water Stress, Nitrogen Status and Canopy Density Using Ground-BasedMultispectral Data,” in Proceedings of the Fifth International Conference onPrecision Agriculture. Bloomington, MN, United States

Beaudette, D. E., Roudier, P., and O’Geen, A. T. (2013). Algorithms forQuantitative Pedology: A Toolkit for Soil Scientists. Comput. Geosci. 52,258–268. doi:10.1016/j.cageo.2012.10.020

Behrens, T., Schmidt, K., MacMillan, R. A., and Viscarra Rossel, R. A. (2018a).Multi-Scale Digital Soil Mapping With Deep Learning. Sci. Rep. 8, 2–10. doi:10.1038/s41598-018-33516-6

Behrens, T., Schmidt, K., Viscarra Rossel, R. A., Gries, P., Scholten, T., andMacMillan, R. A. (2018b). Spatial Modelling with Euclidean Distance Fieldsand Machine Learning. Eur. J. Soil Sci. 69, 757–770. doi:10.1111/ejss.12687

Bennett, K., and Campbell, C. (2000). Support Vector Machines: Hype orHallelujah. SIGKDD Explor. 2, 1–13. doi:10.1145/380995.380999

BGR (2008a). Soil Scapes in Germany 1:5,000,000. BGL5000. Hanover: FederalInstitute for Geosciences and Natural Resources.

BGR (2008b). Groups of soil parent material in Germany 1:5,000,000. BAG5000,Version 3.0. Hanover: Federal Institute for Geosciences and Natural Resources.

BGR (2007). Geomorphographic Map of Germany, GMK1000. Hanover: FederalInstitute for Geosciences and Natural Resources.

BGR (2013). Soil Map of Germany 1:1,000,000. BÜK1000. Hanover: FederalInstitute for Geosciences and Natural Resources.

BGR (2018). Soil Map of Germany 1:250,000. Hanover: Federal Institute forGeosciences and Natural Resources.

BGR and SDG (2019). Hydrogeological Map of Germany 1:250,000 (HÜK250).Hanover: Federal Institute for Geosciences and Natural Resources (BGR) andGerman State Geological Surveys (SGD).

Bishop, T. F. A., McBratney, A. B., and Laslett, G. M. (1999). Modelling SoilAttribute Depth Functions With Equal-Area Quadratic Smoothing Splines.Geoderma 91, 27–45. doi:10.1016/S0016-7061(99)00003-8

BKG (2020). Digital Land Model at Map Scale 1:250,000 (version 2.0). FederalAgency for Cartography and Geodesy

Boden, Ad-hoc-A. G. (2005). Bodenkundliche Kartieranleitung. 5th Edn.Hannover, Germany: E.Schweizerbart’sche Verlagsbuchhandlung.

Bönecke, E., Breitsameter, L., Brüggemann, N., Chen, T. W., Feike, T., Kage, H.,et al. (2020). Decoupling of Impact Factors Reveals the Response of GermanWinter Wheat Yields to Climatic Changes. Glob. Change Biol. 26, 3601–3626.doi:10.1111/gcb.15073

Büttner, G., Kostztra, B., Soukup, T., Sousa, A., and Langanke, T. (2017). CLC2018Technical Guidelines. Vienna: European Environment Agency. Available at:https://land.copernicus.eu/user-corner/technical-library/clc2018technicalguidelines_final.pdf (Accessed June 1, 2020)

Castaldi, F., Chabrillat, S., Don, A., and van Wesemael, B. (2019). Soil OrganicCarbon Mapping Using LUCAS Topsoil Database and Sentinel-2 Data: AnApproach to Reduce Soil Moisture and Crop Residue Effects. Remote Sens. 11,2121. doi:10.3390/rs11182121

Challinor, A. J., Ewert, F., Arnold, S., Simelton, E., and Fraser, E. (2009). Crops andClimate Change: Progress, Trends, and Challenges in Simulating Impacts andInforming Adaptation. J. Exp. Bot. 60, 2775–2789. doi:10.1093/jxb/erp062

Chang, C.-C., and Lin, C.-J. (2011). Libsvm. ACM Trans. Intell. Syst. Technol. 2,1–39. doi:10.1145/1961189.1961199

Charrad, M., Ghazzali, N., Boiteau, V., and Niknafs, A. (2014). NbClust:AnRPackage for Determining the Relevant Number of Clusters in a DataSet. J. Stat. Soft. 61, 1–36. doi:10.18637/jss.v061.i06

Conrad, O., Bechtel, B., Bock, M., Dietrich, H., Fischer, E., Gerlitz, L., et al. (2015).System for Automated Geoscientific Analyses (SAGA) v. 2.1.4. Geosci. ModelDev. 8, 1991–2007. doi:10.5194/gmd-8-1991-2015

Cortes, C., and Vapnik, V. (1995). Support-Vector Networks. Mach. Learn. 20,273–297. doi:10.1007/BF00994018

de Brogniez, D., Ballabio, C., Stevens, A., Jones, R. J. A., Montanarella, L., and vanWesemael, B. (2015). A Map of the Topsoil Organic Carbon Content of EuropeGenerated by a Generalized Additive Model. Eur. J. Soil Sci. 66, 121–134. doi:10.1111/ejss.12193

DWD (2018a). Seasonal Grids of Monthly Averaged Daily Air Temperature (2m)Over Germany. version v1.0. Offenbach, Deutscher Wetterdienst.

DWD (2018b). Seasonal Grids of Sum of Precipitation over Germany. version v1.0.Offenbach, Deutscher Wetterdienst.

DWD (2018c). Seasonal Grids of Sum of Drought Index (de Martonne) OverGermany. version v1.0. Offenbach: Deutscher Wetterdienst.

Emadi, M., Taghizadeh-Mehrjardi, R., Cherati, A., Danesh, M., Mosavi, A., andScholten, T. (2020). Predicting and Mapping of Soil Organic Carbon UsingMachine Learning Algorithms in Northern Iran. Remote Sens. 12 (14), 2234.doi:10.3390/rs12142234

Endlicher, W., and Hendl, M. (2003). “Klimaspektrum zwischen Zugspitze undRügen,” in ifL: Nationalatlas Bundesrepublik Deutschland, Band 3 - Natur undUmwelt II: Klima, Pflanzen- und Tierwelt (Leipzig: Leibniz-Institut fürLänderkunde), 32–33.

Esfandiarpour-Boroujeni, I., Shahini-Shamsabadi, M., Shirani, H., Mosleh, Z.,Bagheri-Bodaghabadi, M., and Salehi, M. H. (2020). Assessment of DifferentDigital Soil Mapping Methods for Prediction of Soil Classes in the ShahrekordPlain, Central Iran. Catena 193, 104648. doi:10.1016/j.catena.2020.104648

Friedman, J. H. (2001). Greedy Function Approximation: A Gradient BoostingMachine. Ann. Statist. 29, 1189–1232. doi:10.1214/aos/1013203451

Gastaldi, G., Minasny, B., and Mcbratney, A. B. (2012). “Mapping the Occurrenceand Thickness of Soil Horizons within Soil Profiles,” in Digital Soil Assessmentsand beyond. Editors B. Minasny, B. P. Malone, and A. McBratney (London,United Kingdom: Taylor & Francis Group), 145–148.

Gebauer, A., Brito Gómez, V. M., and Ließ, M. (2019). Optimisation in MachineLearning: An Application to Topsoil Organic Stocks Prediction in a Dry forestEcosystem. Geoderma 354, 113846. doi:10.1016/j.geoderma.2019.07.004

Gebauer, A., Ellinger, M., Brito Gómez, V. M., and Ließ, M. (2020). Developmentof Pedotransfer Functions for Water Retention in Tropical Mountain SoilLandscapes: Spotlight on Parameter Tuning in Machine Learning. Soil 6,215–229. doi:10.5194/soil-6-215-2020

Gessler, P. E., Moore, I. D., McKenzie, N. J., and Ryan, P. J. (1995). Soil-LandscapeModelling and Spatial Prediction of Soil Attributes. Int. J. Geogr. Inf. Syst. 9,421–432. doi:10.1080/02693799508902047

Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E. (2015). Peeking inside theBlack Box: Visualizing Statistical Learning with Plots of Individual ConditionalExpectation. J. Comput. Graph. Stat. 24, 44–65. doi:10.1080/10618600.2014.907095

Gower, J. C. (1971). A General Coefficient of Similarity and Some of its Properties.Biometrics 27, 857–874. doi:10.1109/ultsym.1987.199076

Greenwell, B. M. (2017). pdp: An R Package for Constructing Partial DependencePlots. R. J. 9, 421–436. doi:10.32614/rj-2017-016

Greenwell, B. (2018). Package “pdp” - Partial Dependence Plots. CRAN Repos.Greiner, L., Nussbaum, M., Papritz, A., Zimmermann, S., Gubler, A., Gret-

Regamey, A., et al. (2018). Uncertainty Indication in Soil Function Maps -Transparent and Easy-to-Use Information to Support Sustainable Use of SoilResources. Soil 4, 123–139. doi:10.5194/soil-4-123-2018

Haupt, R. L., and Haupt, S. E. (1998). Practical Genetic Algorithms. New York, NY:John Wiley & Sons, Inc.

Hengl, T., Mendes De Jesus, J., Heuvelink, G. B. M., Ruiperez Gonzalez, M.,Kilibarda, M., Blagotic, A., et al. (2017). SoilGrids250m: Global Gridded Soil

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 69295922

Ließ et al. Multivariate 3D Soil Parameter Space

Page 23: Machine learning with GA optimization to model the

Information Based on Machine Learning. PLoS One 12, e0169748. doi:10.1371/journal.pone.0169748

Holland, J. H. (1975).Adaptation in Natural and Artificial Systems. Ann Arbor, MI:The University of Michigan Press.

Jacobs, A., Flessa, H., Don, A., Heidkamp, A., Prietz, R., Gensior, A., et al. (2018).Landwirtschaftlich genutzte Böden in Deutschland - Ergebnisse derBodenzustandserhebung, Thünen Report 64. Braunschweig, Germany: JohannHeinrich von Thünen-Institut.

Jakobs, I., Grimm, F., Keppner, L., and Hilliges, F. (2020). Nitratbericht 2020.Hannover: Bundesministerium für Ernährung und Landwirtschaft (BMEL) undBundesministerium für Umwelt, Naturschutz und nukleare Sicherheit (BMU).

Jenny, H. (1941). Factors of Soil Formation: A System of Quantitative Pedology.New York, NY: Dover Publications, Inc.

Jeong, G., Oeverdieck, H., Park, S. J., Huwe, B., and Ließ, M. (2017). Spatial SoilNutrients Prediction Using Three Supervised Learning Methods forAssessment of Land Potentials in Complex Terrain. Catena 154, 73–84.doi:10.1016/j.catena.2017.02.006

JRC (2013). INSPIRE Metadata Implementing Rules: Technical Guidelines basedon EN ISO 19115 and EN ISO 19119 - V.1.3. 99. Available at: http://inspire.ec.europa.eu/ (Accessed June 1, 2020).

Kasampalis, D. A., Alexandridis, T. K., Deva, C., Challinor, A., Moshou, D., andZalidis, G. (2018). Contribution of Remote Sensing on Crop Models: A Review.J. Imaging 4. doi:10.3390/jimaging4040052

Kaufman, L., and Rousseeuw, P. J. (1990). Finding Groups in Data: An Introductionto Cluster Analysis. New York, NY: John Wiley & Sons.

Klein, D., and Menz, G. (2003). “Der Niederschlag im Jahresverlauf,” in ifL:Nationalatlas Bundesrepublik Deutschland, Band 3 – Natur und Umwelt IIKlima, Pflanzen- und Tierwelt (Leipzig: Leibniz-Institut für Länderkunde),44–47.

Knoll, L., Breuer, L., and Bach, M. (2020). Nation-wide Estimation of GroundwaterRedox Conditions and Nitrate Concentrations through Machine Learning.Environ. Res. Lett. 15, 064004. doi:10.1088/1748-9326/ab7d5c

Küster, M., and Stöckhert, B. (2003). “Der tektonische Bau Deutschlands,” in ifL:Nationalatlas Bundesrepublik Deutschland, Band 2 – Natur und Umwelt I:Relief, Boden und Wasser (Leipzig: Leibniz-Institut für Länderkunde), 36–37.

Liedtke, H., and Mäusbacher, R. (2003). “Grundzüge der Reliefgliederung,” in ifL:Nationalatlas Bundesrepublik Deutschland, Band 2 – Natur und Umwelt I:Relief, Boden und Wasser (Leipzig: Leibniz-Institut für Länderkunde), 58–59.

Ließ, M. (2021). Multivariate 3D Soil Parameter Space — Germany [AgriculturalSoil-Landscape, Version 1.0]. doi:10.17605/OSF.IO/GQBMD

Ließ, M., Glaser, B., and Huwe, B. (2011). Functional Soil-Landscape Modelling toEstimate Slope Stability in a Steep Andean Mountain Forest Region.Geomorphology 132, 287–299. doi:10.1016/j.geomorph.2011.05.015

Ließ, M., Glaser, B., and Huwe, B. (2012). Making Use of theWorld Reference BaseDiagnostic Horizons for the Systematic Description of the Soil Continuum -Application to the Tropical Mountain Soil-Landscape of Southern Ecuador.Catena 97, 20–30. doi:10.1016/j.catena.2012.05.002

Ließ, M., and Huwe, B. (2012). “Uncertainty in Soil Regionalisation and itsInfluence on Slope Stability Estimation,” in Large Slow Active SlopeMovements with a Section on Landslide Hydrology – Hillslope HydrologicalModelling for a Landslides Prediction. Editors U. G. L. Picarelli and R. Greco(Napels: Book of Proceedings: Italian Workshop on Landslides 2011), 171–177.

Liu, X., Herbert, S. J., Hashemi, A. M., Zhang, X., and Ding, G. (2011). Effects ofAgricultural Management on Soil Organic Matter and Carbon Transformation– a Review. Plant Soil Environ. 52, 531–543. doi:10.17221/3544-pse

Ma, Y., Minasny, B., McBratney, A., Poggio, L., and Fajardo, M. (2021). PredictingSoil Properties in 3D: Should Depth be a Covariate? Geoderma 383, 114794.doi:10.1016/j.geoderma.2020.114794

Markonis, Y., Kumar, R., Hanel, M., Rakovec, O., Máca, P., and AghaKouchak, A.(2021). The Rise of Compound Warm-Season Droughts in Europe. Sci. Adv. 7,eabb9668. doi:10.1126/sciadv.abb9668

Mazaheri, A. R., and Jafarian, F. (2019). Artificial Neural Network andOptimization Algorithm to Improve Soil Resistance by Means ofAggregation Size Variation. 3, 179–186. doi:10.22060/ajce.2018.14988.5512

McBratney, A. B., Mendonca Santos, M. L., andMinasny, B. (2003). On Digital SoilMapping. Geoderma 117, 3–52. doi:10.1016/S0016-7061(03)00223-4

Meyer, D. (2019). Support Vector Machines - The Interface to Libsvm in Package.FH Technikum Wien.

Minasny, B., Whelan, B. M., Triantafilis, J., and McBratney, A. B. (2013).Pedometrics Research in the Vadose Zone-Review and Perspectives. VadoseZone J. 12, 1–20. doi:10.2136/vzj2012.0141

Møller, A. B., Beucher, A. M., Pouladi, N., and Greve, M. H. (2020). ObliqueGeographic Coordinates as Covariates for Digital Soil Mapping. SOIL 6,269–289. doi:10.5194/soil-6-269-2020

Moore, A.W., Russell, J. S., andWard, W. T. (1972). Numerical Analysis of Soils: AComparison of Three Soil Profile Models With Field Classification. J. Soil Sci.23, 193–209. doi:10.1111/j.1365-2389.1972.tb01653.x

Mueller, L., Schindler, U., Mirschel, W., Shepherd, T. G., Ball, B. C., Helming, K.,et al. (2010). Assessing the Productivity Function of Soils. A Review. Agron.Sustain. Dev. 30, 601–614. doi:10.1051/agro/2009057

Nguyen, H., Choi, Y., Bui, X.-N., and Nguyen-Thoi, T. (2020). Predicting Blast-Induced Ground Vibration in Open-Pit Mines Using Vibration Sensors andSupport Vector Regression-Based Optimization Algorithms. Sensors 20, 132.doi:10.3390/s20010132

Nussbaum, M., Spiess, K., Baltensweiler, A., Grob, U., Keller, A., Greiner, L., et al.(2018). Evaluation of Digital Soil Mapping Approaches With Large Sets ofEnvironmental Covariates. Soil 4, 1–22. doi:10.5194/soil-4-1-2018

Padarian, J., Minasny, B., and McBratney, A. B. (2020). Machine Learning and SoilSciences: A Review Aided by Machine Learning Tools. Soil 6, 35–52. doi:10.5194/soil-6-35-2020

Padarian, J., Minasny, B., and McBratney, A. B. (2019). Using Deep Learning forDigital Soil Mapping. Soil 5, 79–89. doi:10.5194/soil-5-79-2019

Poeplau, C., Don, A., Flessa, H., Heidkamp, A., Jacobs, A., and Prietz, R. (2020).First German Agricultural Soil Inventory – Core DatasetGöttingen: Open AgrarRepositorium. doi:10.3220/DATA20200203151139

Poggio, L., and Gimona, A. (2017). 3D Mapping of Soil Texture in Scotland.Geoderma Reg. 9, 5–16. doi:10.1016/j.geodrs.2016.11.003

Poggio, L., and Gimona, A. (2014). National Scale 3D Modelling of Soil OrganicCarbon Stocks with Uncertainty Propagation - An Example From Scotland.Geoderma 232–234, 284–299. doi:10.1016/j.geoderma.2014.05.004

Rayner, J. H. (1966). Classification of Soils by Numerical Methods. J. Soil Sci. 17,79–92. doi:10.1111/j.1365-2389.1966.tb01454.x

Rossiter, D. G. (2018). Past, Present & Future of Information Technology inPedometrics. Geoderma 324, 131–137. doi:10.1016/j.geoderma.2018.03.009

Safanelli, J. L., Chabrillat, S., Ben-Dor, E., and Dematte, J. A. M. (2020).Multispectral Models From Bare Soil Composites for Mapping TopsoilProperties Over Europe. Remote Sens. 12, 1369. doi:10.3390/RS12091369

Scrucca, L. (2013). GA: A Package for Genetic Algorithms inR. J. Stat. Soft. 53,1–37. doi:10.18637/jss.v053.i04

Scrucca, L. (2017). On Some Extensions to GA Package: Hybrid Optimisation,Parallelisation and Islands Evolution on Some Extensions to GA Package:Hybrid Optimisation, Parallelisation and Islands Evolution. R. J. 9, 187–206.doi:10.32614/RJ-2017-008

Scull, P., Franklin, J., Chadwick, O. A., and McArthur, D. (2003). Predictive SoilMapping: A Review. Prog. Phys. Geogr. Earth Environ. 27, 171–197. doi:10.1191/0309133303pp366ra

Sharififar, A., Sarmadian, F., Malone, B. P., and Minasny, B. (2019). Addressing theIssue of Digital Mapping of Soil Classes With Imbalanced Class Observations.Geoderma 350, 84–92. doi:10.1016/j.geoderma.2019.05.016

Statistisches Bundesamt (2020). Land Use - Agriculture and Forestry,Fisheries. Available at: https://www.destatis.de/EN/Themes/Economic-Sectors-Enterprises/Agriculture-Forestry-Fisheries/Land-Use/_node.html;jsessionid�D40446229B48722918EED994173D1868.internet8732 (AccessedAugust 14, 2020).

Sundermann, G., Wägner, N., Cullmann, A., von Hirschhausen, C. R., and Kemfert,C. (2020). Nitrate Pollution of Groundwater Long Exceeding Trigger Value:Fertilization Practices Require More Transparency and Oversight. DIW Weekly.Berlin, Germany: Deutsches Institut für Wirtschaftsforschung (DIW).

Swinnen, E., and Toté, C. (2015). Gio Global Land Component - Lot I ”Operationof the Global Land Component Framework Service Contract N° 388533, JRC,Normalized Difference Vegetation Index (NDVI) V2, Vegetation ConditionIndex, Vegetation Productivity Index. Algorithm Theor. Basis Doc. (I2.11).Available at: https://land.copernicus.eu/global/sites/cgls.vito.be/files/products/GIOGL1_ATBD_NDVI-VCI-VPI_I2.11.pdf (Accessed June 1, 2020)

Swinnen, E., and Van Hoolst, R. (2019). Copernicus Global Land Operations“Vegetation and Energy”. Version 1. Available at: https://land.copernicus.eu/

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 69295923

Ließ et al. Multivariate 3D Soil Parameter Space

Page 24: Machine learning with GA optimization to model the

global/sites/cgls.vito.be/files/products/CGLOPS1_ATBD_DMP300m-V1_I1.12.pdf (Accessed June 1, 2020).

Taghizadeh-Mehrjardi, R., Schmidt, K., Amirian-Chakan, A., Rentschler, T.,Zeraatpisheh, M., Sarmadian, F., et al. (2020). Improving the SpatialPrediction of Soil Organic Carbon Content in Two Contrasting ClimaticRegions by Stacking Machine Learning Models and Rescanning CovariateSpace. Remote Sens. 12, 1095. doi:10.3390/rs12071095

Terribile, F., Coppola, A., Langella, G., Martina, M., and Basile, A. (2011). Potential andLimitations of Using Soil Mapping Information to Understand LandscapeHydrology. Hydrol. Earth Syst. Sci. 15, 3895–3933. doi:10.5194/hess-15-3895-2011

Tifafi, M., Guenet, B., and Hatté, C. (2018). Large Differences in Global and RegionalTotal Soil Carbon Stock Estimates Based on SoilGrids, HWSD, and NCSCD:Intercomparison and EvaluationBased on Field Data FromUSA, England,Wales,and France. Glob. Biogeochem. Cycles 32, 42–56. doi:10.1002/2017GB005678

Tóth, G., Jones, A., andMontanarella, L. (2013). The LUCAS Topsoil Database andDerived Information on the Regional Variability of Cropland TopsoilProperties in the European Union. Environ. Monit. Assess. 185, 7409–7425.doi:10.1007/s10661-013-3109-3

van Hateren, T. C., Chini, M., Matgen, P., and Teuling, A. J. (2020). AmbiguousAgricultural Drought: Characterising Soil Moisture and Vegetation Droughts inEurope From Earth Observation. Hydrol. Earth Syst. Sci. Discuss. [Epub aheadof print]. doi:10.5194/hess-2020-583

Vaudour, E., Gomez, C., Lagacherie, P., Loiseau, T., Baghdadi, N., Urbina-Salazar,D., et al. (2021). Temporal Mosaicking Approaches of Sentinel-2 Images forExtending Topsoil Organic Carbon Content Mapping in Croplands. Int. J. Appl.Earth Obs. Geoinf. 96, 102277. doi:10.1016/j.jag.2020.102277

Veronesi, F., Corstanje, R., and Mayr, T. (2012). Mapping Soil Compaction in 3DWith Depth Functions. Soil Tillage Res. 124, 111–118. doi:10.1016/j.still.2012.05.009

Vogel, H.-J., Bartke, S., Daedlow, K., Helming, K., Kögel-Knabner, I., Lang, B., et al.(2018). A Systemic Approach for Modeling Soil Functions. Soil 4, 83–92. doi:10.5194/soil-4-83-2018

Vogel, H.-J., Eberhardt, E., Franko, U., Lang, B., Ließ, M., Weller, U., et al. (2019).Quantitative Evaluation of Soil Functions: Potential and State. Front. Environ.Sci. 7. 164. doi:10.3389/fenvs.2019.00164

Vogt, J., and Foisneau, S. (2007). CCM River and Catchment Database — Version2.0 Analysis Tools. EUR 22649 EN JRC36122.

Wadoux, A. M. J.-C., Padarian, J., and Minasny, B. (2019). Multi-Source DataIntegration for Soil Mapping Using Deep Learning. SOIL 5, 107–119. doi:10.5194/soil-5-107-2019

Ward, J. H. (1963). Hierarchical Grouping to Optimize an Objective Function.J. Am. Stat. Assoc. 58, 236–244. doi:10.1080/01621459.1963.10500845

Webber, H., Lischeid, G., Sommer, M., Finger, R., Nendel, C., Gaiser, T., et al.(2020). No Perfect Storm for Crop Yield Failure in Germany. Environ. Res. Lett.15, 104012. doi:10.1088/1748-9326/aba2a4

White, J. W., Hoogenboom, G., Kimball, B. A., and Wall, G. W. (2011).Methodologies for Simulating Impacts of Climate Change on CropProduction. Field Crops Res. 124, 357–368. doi:10.1016/j.fcr.2011.07.001

Wiesmeier, M., Urbanski, L., Hobley, E., Lang, B., von Lützow, M., Marin-Spiotta,E., et al. (2019). Soil Organic Carbon Storage as a Key Function of Soils - AReview of Drivers and Indicators at Various Scales. Geoderma 333, 149–162.doi:10.1016/j.geoderma.2018.07.026

Zeraatpisheh, M., Bakhshandeh, E., Hosseini, M., and Alavi, S. M. (2020).Assessing the Effects of Deforestation and Intensive Agriculture on the SoilQuality Through Digital Soil Mapping. Geoderma 363, 114139. doi:10.1016/j.geoderma.2019.114139

Zhang, M., Shi, W., and Xu, Z. (2020). Systematic Comparison of Five Machine-Learning Models in Classification and Interpolation of Soil Particle SizeFractions Using Different Transformed Data. Hydrol. Earth Syst. Sci. 24,2505–2526. doi:10.5194/hess-24-2505-2020

Zink, M., Samaniego, L., Kumar, R., Thober, S., Mai, J., Schäfer, D., et al. (2016).The German Drought Monitor. Environ. Res. Lett. 11, 074002. doi:10.1088/1748-9326/11/7/074002

Conflict of Interest: The authors declare that the research was conducted in theabsence of any commercial or financial relationships that could be construed as apotential conflict of interest.

Copyright © 2021 Ließ, Gebauer and Don. This is an open-access article distributedunder the terms of the Creative Commons Attribution License (CC BY). The use,distribution or reproduction in other forums is permitted, provided the originalauthor(s) and the copyright owner(s) are credited and that the original publicationin this journal is cited, in accordance with accepted academic practice. No use,distribution or reproduction is permitted which does not comply with these terms.

Frontiers in Environmental Science | www.frontiersin.org June 2021 | Volume 9 | Article 69295924

Ließ et al. Multivariate 3D Soil Parameter Space