16
Ecological Modelling 154 (2002) 135 – 150 Illuminating the ‘‘black box’’: a randomization approach for understanding variable contributions in artificial neural networks Julian D. Olden *, Donald A. Jackson Department of Zoology, Uniersity of Toronto, 25 Harbord Street, Toronto, Ontario, Canada M5S 3G5 Received 2 May 2000; received in revised form 15 February 2002; accepted 4 March 2002 Abstract With the growth of statistical modeling in the ecological sciences, researchers are using more complex methods, such as artificial neural networks (ANNs), to address problems associated with pattern recognition and prediction. Although in many studies ANNs have been shown to exhibit superior predictive power compared to traditional approaches, they have also been labeled a ‘‘black box’’ because they provide little explanatory insight into the relative influence of the independent variables in the prediction process. This lack of explanatory power is a major concern to ecologists since the interpretation of statistical models is desirable for gaining knowledge of the causal relationships driving ecological phenomena. In this study, we describe a number of methods for understanding the mechanics of ANNs (e.g. Neural Interpretation Diagram, Garson’s algorithm, sensitivity analysis). Next, we propose and demonstrate a randomization approach for statistically assessing the importance of axon connection weights and the contribution of input variables in the neural network. This approach provides researchers with the ability to eliminate null-connections between neurons whose weights do not significantly influence the network output (i.e. predicted response variable), thus facilitating the interpretation of individual and interacting contributions of the input variables in the network. Furthermore, the randomization approach can identify variables that significantly contribute to network predictions, thereby providing a variable selection method for ANNs. We show that by extending randomization approaches to ANNs, the ‘‘black box’’ mechanics of ANNs can be greatly illuminated. Thus, by coupling this new explanatory power of neural networks with its strong predictive abilities, ANNs promise to be a valuable quantitative tool to evaluate, understand, and predict ecological phenomena. © 2002 Elsevier Science B.V. All rights reserved. Keywords: Connection weights; Sensitivity analysis; Neural Interpretation Diagram; Garson’s algorithm; Statistical models www.elsevier.com/locate/ecolmodel * Corresponding author. Present address: Department of Biology, Graduate Degree Program in Ecology, Colorado State University, Fort Collins, CO 80523-1878, USA. Tel.: +1-970-491-5432; fax: +1-970-491-0649. E-mail addresses: [email protected] (J.D. Olden), [email protected] (D.A. Jackson). 0304-3800/02/$ - see front matter © 2002 Elsevier Science B.V. All rights reserved. PII:S0304-3800(02)00064-9

Illuminating the ''black box'': a randomization approach for

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Ecological Modelling 154 (2002) 135–150

Illuminating the ‘‘black box’’: a randomization approach forunderstanding variable contributions in artificial neural

networks

Julian D. Olden *, Donald A. JacksonDepartment of Zoology, Uni�ersity of Toronto, 25 Harbord Street, Toronto, Ontario, Canada M5S 3G5

Received 2 May 2000; received in revised form 15 February 2002; accepted 4 March 2002

Abstract

With the growth of statistical modeling in the ecological sciences, researchers are using more complex methods,such as artificial neural networks (ANNs), to address problems associated with pattern recognition and prediction.Although in many studies ANNs have been shown to exhibit superior predictive power compared to traditionalapproaches, they have also been labeled a ‘‘black box’’ because they provide little explanatory insight into the relativeinfluence of the independent variables in the prediction process. This lack of explanatory power is a major concernto ecologists since the interpretation of statistical models is desirable for gaining knowledge of the causal relationshipsdriving ecological phenomena. In this study, we describe a number of methods for understanding the mechanics ofANNs (e.g. Neural Interpretation Diagram, Garson’s algorithm, sensitivity analysis). Next, we propose anddemonstrate a randomization approach for statistically assessing the importance of axon connection weights and thecontribution of input variables in the neural network. This approach provides researchers with the ability to eliminatenull-connections between neurons whose weights do not significantly influence the network output (i.e. predictedresponse variable), thus facilitating the interpretation of individual and interacting contributions of the input variablesin the network. Furthermore, the randomization approach can identify variables that significantly contribute tonetwork predictions, thereby providing a variable selection method for ANNs. We show that by extendingrandomization approaches to ANNs, the ‘‘black box’’ mechanics of ANNs can be greatly illuminated. Thus, bycoupling this new explanatory power of neural networks with its strong predictive abilities, ANNs promise to be avaluable quantitative tool to evaluate, understand, and predict ecological phenomena. © 2002 Elsevier Science B.V.All rights reserved.

Keywords: Connection weights; Sensitivity analysis; Neural Interpretation Diagram; Garson’s algorithm; Statistical models

www.elsevier.com/locate/ecolmodel

* Corresponding author. Present address: Department of Biology, Graduate Degree Program in Ecology, Colorado StateUniversity, Fort Collins, CO 80523-1878, USA. Tel.: +1-970-491-5432; fax: +1-970-491-0649.

E-mail addresses: [email protected] (J.D. Olden), [email protected] (D.A. Jackson).

0304-3800/02/$ - see front matter © 2002 Elsevier Science B.V. All rights reserved.

PII: S0 304 -3800 (02 )00064 -9

J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150136

1. Introduction

Artificial neural networks (ANNs) are receivingincreased attention in the ecological sciences as apowerful, flexible, statistical modeling techniquefor uncovering patterns in data (Colasanti, 1991;Edwards and Morse, 1995; Lek et al., 1996a; Lekand Guegan, 2000); a fact recently demonstratedduring the first international workshop on theapplications of neural networks to ecologicalmodeling (conference papers are published in Eco-logical Modelling : Volume 120, Issue 2–3). Theutility of ANNs for solving complex patternrecognition problems has been demonstrated inmany terrestrial (e.g. Paruelo and Tomasel, 1997;O� zesmi and O� zesmi, 1999; Manel et al., 1999;Spitz and Lek, 1999) and aquatic studies (e.g. Leket al., 1996a,b; Bastarache et al., 1997; Mastrorilloet al., 1998; Chen and Ware, 1999; Gozlan et al.,1999; Olden and Jackson, 2001; Scardi, 2001), andhas led many researchers to advocate ANNs as anattractive, non-linear alternative to traditionalstatistical methods.

The primary application of ANNs involves thedevelopment of predictive models to forecast fu-ture values of a particular response variable froma given set of independent variables. Although thepredictive value of ANNs appeals greatly to manyecologists, researchers have often criticized theexplanatory value of ANNs, calling it a ‘‘blackbox’’ approach to modeling ecological phenomena(e.g. Paruelo and Tomasel, 1997; Lek and Gue-gan, 1999; O� zesmi and O� zesmi, 1999). This viewstems from the fact that the contribution of theinput variables in predicting the value of theoutput is difficult to disentangle within the net-work. Consequently, input variables are often en-tered into the network and an output value isgenerated without gaining any understanding ofthe inter-relationships between the variables, andtherefore, providing no explanatory insight intothe underlying mechanisms being modeled by thenetwork (Anderson, 1995; Bishop, 1995; Ripley,1996). The ‘‘black box’’ nature of ANNs is amajor weakness compared to traditional statisticalapproaches that can readily quantify the influenceof the independent variables in the modeling pro-cess, as well as provide a measure of the degree of

confidence regarding their contribution. Cur-rently, there is a lack of theoretical or practicalways to partition the contributions of the indepen-dent variables in ANNs (Smith, 1994); thus pre-senting a substantial drawback in the ecologicalsciences where the interpretation of statisticalmodels is desirable for gaining insight into causalrelationships driving ecological phenomena.

Recently, a number of methods have been pro-posed for selecting the best network architecture(i.e. number of neurons and topology of connec-tions) among a set of candidate networks, e.g.asymptotic comparison techniques, approximateBayesian analysis, and cross validation (Dimo-poulos et al., 1995; see Bishop, 1995 for review).In contrast, methods for quantifying the indepen-dent variable contributions within networks aremore complicated, and as a result are rarely usedin ecological studies. For example, intensive com-putational approaches such as growing and prun-ing algorithms (Bishop, 1995), partial derivatives(e.g. Dimopoulos et al., 1995, 1999) and asymp-totic t-tests are often not used in favour of simplertechniques that use network connection weights(e.g. Garson’s algorithm: Garson, 1991; Lek’s al-gorithm: Lek et al., 1996a). Although these sim-pler approaches provide a means of determiningthe overall influence of each predictor variable,interactions among the variables are more difficultto interpret since the strength and direction ofindividual axon connection weights within thenetwork must be examined directly. Bishop (1995)discusses the use of pruning algorithms to removeconnection weights that do not contribute to thepredictive performance of the neural network. Inbrief, a pruning approach begins with a highlyconnected network (i.e. large number of connec-tions among neurons), and then successively re-moves weak connections (i.e. small absoluteweights) or connections that cause a minimalchange in the network error function when re-moved. A logical question then arises: at whatthreshold value (i.e. absolute connection weight orchange in network error) should weights be re-moved or retained in the network? In the presentstudy, we propose a randomization test for ANNsto address this question. This randomization ap-proach provides a statistical pruning technique for

J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150 137

eliminating null connection weights that mini-mally influence the predicted output, as well asprovides a selection method for identifying inde-pendent variables that significantly contribute tonetwork predictions. By using randomization pro-tocols to partition the importance of connectionweights (in terms of their magnitude and direc-tion), researchers will be able to quantitativelyassess both the individual and interactive effectsof the input variables in the network predictionprocess, as well as evaluate the overall contribu-tions of the variables. Using an empirical exampledescribing the relationship between fish speciesrichness and habitat characteristics of north-tem-perate lakes, we illustrate the utility of the ANNrandomization test, and compare its results to twocommonly used approaches: Garson’s algorithmand sensitivity analysis.

2. Empirical example: fish speciesrichness–habitat relationships in lakes

Throughout this paper we use an empiricalexample relating fish species richness to habitatconditions of 286 freshwater lakes located in Al-gonquin Provincial Park, south-central Ontario,Canada (45°50� N, 78°20� W). We tabulated spe-cies presence for each lake to examine relation-ships between fish species richness (ranging from 1to 23) and a suite of habitat-related variables (8 intotal). Predictor variables were chosen to includefactors that have been shown to be related tocritical habitat requirements of fish in this geo-graphic region (Minns, 1989), including: surfacearea, lake volume, and total shoreline perimeterwhich are correlated with habitat diversity; maxi-mum depth which is negatively correlated withwinter dissolved-oxygen concentrations and re-lated to thermal stratification; surface measure-ments (taken at depths �2.0 m) of pH and totaldissolved solids to provide an estimate of nutrientstatus and lake productivity; lake elevation whichis related to both habitat heterogeneity and colo-nization/extinction features of the lake; and grow-ing degree-days which is a surrogate forproductivity.

3. Interpreting neural-network connection weights:an important caveat

We refrain from detailing the specifics of neuralnetwork optimization and design, and insteadrefer the reader to the extensive coverage providedin the texts by Smith (1994), Bishop (1995), Rip-ley (1996), as well as articles by Ripley (1994),Cheng and Titterington (1994). It is sufficient tosay that the methods described in this paper referto the classic family of one hidden-layer, feed-for-ward neural network trained by the backpropaga-tion algorithm (Rumelhart et al., 1986). Theseneural networks are commonly used in ecologicalstudies because they are suggested to be universalapproximators of any continuous function(Hornik et al., 1989). N-fold cross validation wasused to determine optimal network design as itprovides a nearly unbiased estimate of predictionsuccess (Olden and Jackson, 2000). We found thata neural network with four hidden neurons exhib-ited good predictive power (r=0.72 between ob-served and predicted species richness).

In the neural network, the connection weightsbetween neurons are the links between the inputsand the outputs, and therefore are the links be-tween the problem and the solution. The relativecontributions of the independent variables to thepredictive output of the neural network dependprimarily on the magnitude and direction of theconnection weights. Input variables with largerconnection weights represent greater intensities ofsignal transfer, and therefore are more importantin the prediction process compared to variableswith smaller weights. Negative connection weightsrepresent inhibitory effects on neurons (reducingthe intensity of the incoming signal) and decreasethe value of the predicted response, whereas posi-tive connection weights represent excitatory ef-fects on neurons (increasing the intensity of theincoming signal) and increase the value of thepredicted response.

Given the obvious importance of connectionweights in assessing the relative contributions ofthe independent variables, there is one topic thatwe believe warrants additional attention. Duringthe optimization process, it is necessary that thenetwork converges to the global minimum of the

J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150138

fitting criterion (e.g. prediction error) rather thanone of the many local minima. Connection weightsin networks that have converged to a local mini-mum will differ from networks that have globallyconverged, thus resulting in the misinterpretationof variable contributions. Two approaches can beemployed to ensure the greatest probability ofnetwork convergence to the global minimum. Thefirst approach involves combining different localminima rather than choosing between them, forexample, by averaging the outputs of networksusing the connection weights corresponding todifferent local minima (e.g. Wolpert, 1992; Perroneand Cooper, 1993; Ripley, 1995). This approachwould also presumably involve averaging the con-tributions of input variables across the networksrepresenting each of the local minima. The secondapproach employs global optimization procedureswhere parameters such as learning rate, momentumor regularization are used during network training(e.g. White, 1989; Gelfand and Mitter, 1991; Rip-ley, 1994). The addition of a learning rate (�) andmomentum (�) parameters during optimization hasbeen used in the ecological literature (e.g. Lek et al.,1996a; Mastrorillo et al., 1997a; Gozlan et al., 1999;Spitz and Lek, 1999; Olden and Jackson, 2001)because in addition to reducing the problem ofconvergence to local minima, it also accelerates theoptimization process. The � regulates the magni-tude of changes in the weights and biases duringoptimization, and � mediates the contribution ofthe last weight change in the previous iteration tothe weight change in the current iteration. Thevalues of � and � can be set constant or can varyduring network optimization, although there are anumber of the disadvantages to holding � and �

constant (see Bishop, 1995). Consequently, valuesof both � and � are commonly modified by eitherincreasing or decreasing their value according towhether the error decreased or increased, respec-tively, during the previous iteration of networkoptimization (e.g. Hagan et al., 1996; Mastrorilloet al., 1998; O� zesmi and O� zesmi, 1999). In ourstudy, we included learning rate and momentumparameters during the optimization process (defin-ing them as a function of the error), and started thenetwork optimization with random connectionsweights between −0.3 and 0.3. The variable learn-

ing rate and momentum parameters, and the smallinterval of initial random weights ensured a highprobability of global network convergence and thusprovided greater confidence regarding the validityof the connection weights and their interpretation.

4. Illuminating the ‘‘black box’’

4.1. Preparation of the data

Prior to neural network optimization, the dataset must be transformed so that the dependent andindependent variables exhibit particular distribu-tional characteristics. The dependent variable mustbe converted to the range [0…1] so that it conformsto the demands of the transfer function used(sigmoid function) in the building of the neuralnetwork. This is accomplished by using the for-mula:

rn=yn−min(Y)

max(Y)−min(Y)(1)

where rn is the converted response value for obser-vation n, yn is the original response value forobservation n, and min(Y) and max(Y) representthe minimum and maximum values, respectively, ofthe response variable Y. Note that the dependentvariable does not have to be converted whenmodeling a binary response variable (e.g. speciespresence/absence) because its values already fallwithin this range.

To standardize the measurement scales of thenetwork inputs, the independent variables are con-verted to z-scores (i.e. mean=0, standard devia-tion=1) using the formula:

zn=xn−X

�X

(2)

where zn is the standardized value of observationn, xn is the original value of observation n, and Xand �X are the mean and standard deviation ofthe variable X. It is essential to standardize theinput variables so that same percentage change inthe weighted sum of the inputs causes a similarpercentage change in the unit output. Both thedependent and independent variables of the rich-

J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150 139

ness–habitat data set were modified using theabove formulas.

4.2. Methods for quantifying input �ariablecontributions in ANNs

In the following section, we detail a series ofmethods that are available to aid in the interpreta-tion of connection weights and variable contribu-tions in neural networks. These approaches havebeen used by ecologists and represent a set oftechniques for understanding neuron connectionsin networks. Next, we extend a randomizationapproach to these methods, illustrating how con-nection weights and the overall influence of theinput variables in the network can be assessedstatistically.

4.2.1. Neural Interpretation Diagram (NID)Recently, a number of investigators have advo-

cated using axon connection weights to interpretpredictor variable contributions in neural networks(e.g. Aoki and Komatsu, 1999; Chen and Ware,1999). O� zesmi and O� zesmi (1999) proposed the

Neural Interpretation Diagram (NID) for provid-ing a visual interpretation of the connection weightsamong neurons, where the relative magnitude ofeach connection weight is represented by line thick-ness (i.e. thicker lines representing greater weights)and line shading represents the direction of theweight (i.e. black lines representing positive, excita-tor signals and gray lines representing negative,inhibitor signals). Tracking the magnitude anddirection of weights between neurons enables re-searchers to identify individual and interactingeffects of the input variables on the output. Fig. 1illustrates the NID for the empirical example andshows the relative influence of each habitat factorin predicting fish species richness. The relationshipbetween the inputs and outputs is determined intwo steps since there are input-hidden layer connec-tions and hidden-output layer connections. Positiveeffects of input variables are depicted by positiveinput-hidden and positive hidden-output connec-tion weights, or negative input-hidden and negativehidden-output connection weights. Negative effectsof input variables are depicted by positive input-hidden and negative hidden-output connection

Fig. 1. NID for neural network modeling fish species richness as a function of eight habitat variables. The thickness of the linesjoining neurons is proportional to the magnitude of the connection weight, and the shade of the line indicates the direction of theinteraction between neurons: black connections are positive (excitator) and gray connections are negative (inhibitor).

J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150140

Fig. 2. Bar plots showing the percentage relative importance of each habitat variable for predicting fish species richness based onGarson’s algorithm (Garson, 1991). See Box 1 for sample calculations.

weights, or by negative input-hidden and positivehidden-output connection weights. Therefore, themultiplication of the two connection weight direc-tions (positive or negative) indicates the effect thateach input variable has on the output variable.Interactions among predictor variables can be iden-tified as input variables with opposing connectionweights entering the same hidden neuron.

The interpretation of connection weights, andmore specifically NIDs, is not an easy task becauseof the complexity of connections among the neu-rons (Fig. 1). Additional hidden neurons wouldonly make this interpretation more difficult. Fur-thermore, a subjective choice must be made regard-ing the magnitude at which connection weightshould be interpreted. These considerations makethe direct examination of connection weights chal-lenging at best and virtually impossible in data setswith large numbers of variables. We show later thata randomization approach can aid in the interpre-tation NIDs by identifying non-significant connec-tion weights that can be removed.

4.2.2. Garson’s algorithmGarson (1991) proposed a method, later

modified by Goh (1995), for partitioning the neuralnetwork connection weights in order to determinethe relative importance of each input variable in thenetwork (see Box 1 for a summary of the protocol).This approach has been used in a number ofecological studies, including Mastrorillo et al.(1997b, 1998), Gozlan et al. (1999), Aurelle et al.(1999), Brosse et al. (1999, 2001). It is important tonote that Garson’s algorithm uses the absolutevalues of the connection weights when calculatingvariable contributions, and therefore does notprovide the direction of the relationship betweenthe input and output variables. Fig. 2 illustrates theresults from our empirical example, highlightingthe relative importance of the habitat variables.Predictor contributions ranged from 6 to 18%, withlake area and elevation exhibiting the strongestrelationships with predicted species richness, andlake volume and pH showing the weakness rela-tionship.

J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150 141

4.2.3. Sensiti�ity analysisA number of investigators have used sensitivity

analysis to determine the spectrum of input vari-able contributions in neural networks. Recently, anumber of alternative types of sensitivity analysishave been proposed in the ecological literature.For example, the Senso-nets approach includes anadditional weight in the network for each inputvariable representing the variable’s sensitivity(Schleiter et al., 1999). Scardi and Harding (1999)added white noise to each input variable andexamined the resulting changes in the meansquare error of the output. Traditional sensitivityanalysis involves varying each input variableacross its entire range while holding all otherinput variables constant; so that the individualcontributions of each variable are assessed. Thisapproach is somewhat cumbersome, however, be-cause there may be an overwhelming number ofvariable combinations to examine. As a result, itis common first to calculate a series of summarymeasures for each of the input variables (e.g.minimum, maximum, quartiles, percentiles), andthen vary each input variable from its minimumto maximum value, in turn, while all other vari-ables are held constant at each of these measures(e.g. O� zesmi and O� zesmi, 1999). Relationshipsbetween each input variable and the response canbe examined for each summary measure, or thecalculated response can be averaged across thesummary measures. Holding the input variablesconstant at a small number of values provides amore manageable sensitivity analysis, yet still re-quires a great deal of the time because each valueof the input variable must be examined. Conse-quently, Lek et al. (1995, 1996a,b) suggested ex-amining only 12 data values delimiting 11 equalintervals over the variable range rather than ex-amining its entire range (this has been termedLek’s algorithm). Contribution plots can be con-structed by averaging the response value across allsummary statistics for each of the 12 values of theinput variable of interest. Many studies have em-ployed Lek’s algorithm, including Lek et al.(1995, 1996a), Mastrorillo et al. (1997a, 1998),Guegan et al. (1998), Lae et al. (1999), Lek-Ang etal. (1999), Spitz and Lek (1999). In this study, weconstructed contribution plots for each of the

eight predictor variables in the neural network byvarying each input variable across its entire rangeand holding all other variables constant at their20th, 40th, 60th and 80th percentile (Fig. 3). It isevident from the contribution plots that the influ-ence of the habitat variables on predicted speciesrichness in the study lakes varies greatly depend-ing on what summary value the other input vari-ables are held. Although the predicted output mayexhibit any number of relationships with the inde-pendent variables, below is a summary of theresponse curves observed in our empiricalexample.� Gaussian response curve— input variable con-

tributes greatest at intermediate values, andexhibits decreasing influence at low and highvalues: e.g. influence of pH and growing-degreedays on species richness.

� Bimodal response curve— input variable con-tributes greatest at low and high values, andexhibits minimal influence at intermediate val-ues: e.g. influence of surface area, maximumdepth, shoreline perimeter and total dissolvedsolids on species richness when all other vari-ables are low in value.

� Left-skewed response curve— input variablecontributes greatest at high values, and exhibitsminimal influence at low and intermediate val-ues: e.g. influence of lake elevation on speciesrichness.

� Right-skewed response curve— input variablecontributes greatest at low values, and exhibitsminimal influence at intermediate and high val-ues: e.g. influence of total dissolved solids onspecies richness, influence of overall lake size(i.e. surface area, maximum depth, volume andshoreline perimeter) on species richness whenall other variables are intermediate in value.

� Decreasing response curve— input variablecontributes decreasingly at increasing values:e.g. influence of surface area on species richnesswhen all other variables are high in value.

� Flat response curve— input variables con-tributes minimally across its entire range: e.g.influence of growing-degree days on speciesrichness when all other variables are high invalue.

J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150142

Fig. 3. Contribution plots from the sensitivity analysis illustrating the neural network response curves to changes in each habitatvariable with all other variables held at their 20th (· · · ·), 40th (-----), 60th (- - -) and 80th (— ) percentile.

J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150 143

4.2.4. Randomization test for artificial neuralnetworks

We propose a randomization test for input-hid-den-output connection weight selection in neuralnetworks. By eliminating null-connection weightsthat do not differ significantly from random, wecan simplify the interpretation of neural networksby reducing the number of axon pathways thathave to be examined for direct and indirect (i.e.interaction) effects on the response variable, forinstance when using NIDs. This objective is simi-lar to statistical pruning techniques (e.g. asymp-totic t-tests), yet does not have to conform to theassumptions of parametric and non-parametricmethods because the randomization approach em-pirically constructs the distribution of expectedvalues under the null hypothesis for the test statis-tic (i.e. weight connection) from the data at hand.Moreover, the randomization approach can beused as a variable selection method for ANNs bysumming across input-hidden-output connectionweights or calculating the relative importance (i.e.Garson’s algorithm) for each input variable. Thisapproach provides a quantitative tool for selectingstatistically significant input variables for inclu-sion into the network, again reducing networkcomplexity and assisting in the network interpre-tation. The following is the randomization proto-col for testing the statistical significance ofconnection weights and input variables:1. construct a number of neural networks using

the original data with different initial randomweights;

2. select the neural network with the best predic-tive performance, record initial random con-nection weights used in constructing thisnetwork, and calculate and record:

(a) input-hidden-output connection weights:the product of input-hidden and hidden-out-put connection weights for each input andhidden neuron (e.g. observed cA1: step 2,Box 1);(b) overall connection weight: the sum ofthe input-hidden-output connection weightsfor each input variable (e.g. observed c1=cA1+cB1);(c) relative importance (%) for each inputvariable based on Garson’s algorithm (e.g.observed RI1: step 4, Box 1);

3. randomly permute the original response vari-able (yrandom);

4. construct a neural network using yrandom andthe initial random connection weights; and

5. repeat steps (3) and (4) a large number oftimes (i.e. 999 times in this study) each timerecording 2(a), (b) and (c); e.g. randomizedcA1, randomized c1, and randomized RI1.

The statistical significance of each input-hidden-output connection weight, overall connectionweight and relative importance of each input vari-able (e.g. observed cA1, observed c1 and observedRI1) can be calculated as the proportion of ran-domized values (e.g. randomized cA1, randomizedc1 and randomized RI1), including the observed,whose value is equal to or more extreme than theobserved values. Fig. 4 illustrates the distributionof randomized input-hidden-output connectionweights (for hidden neuron B), overall connectionweight and relative importance of surface area forpredicting species richness of lakes.

Table 1 contains the connection weight struc-ture for the neural network and the associatedp-values from the randomization tests. The resultsshow that only a fraction of the total 32 input-hidden-output connections (i.e. 8 inputs×4 hid-den neurons) are statistically different from whatwould be expected based on chance alone. Forinstance, only six input-hidden-output connec-tions are significant at �=0.05. The results alsoshow that when you account for all connectionweights (i.e. overall connection weight), lake size(i.e. surface area, maximum depth, volume andshoreline perimeter) and pH are positively associ-ated with species richness, while elevation, totaldissolved solids and growing-degree days are neg-atively associated with species richness. However,only the influence of maximum depth and shore-line perimeter are statistically significant (Table1). Interestingly, the results from the randomiza-tion test using relative importance (derived fromGarson’s algorithm) differ from the results usingoverall connection weights. Using Garson’s al-gorithm, surface area was the only significantfactor correlated with species richness, and eleva-tion was marginally nonsignificant (Table 1). Thediscrepancy between the two approaches resultsfrom the different ways that the methods use the

J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150144

Tab

le1

Axo

nco

nnec

tion

wei

ghts

for

the

neur

alne

twor

km

odel

ing

fish

spec

ies

rich

ness

asa

func

tion

ofei

ght

habi

tat

vari

able

s

iO

vera

llco

nnec

tion

Pre

dict

orva

riab

leH

idde

nne

uron

AR

elat

ive

impo

rtan

ceH

idde

nne

uron

BH

idde

nne

uron

CH

idde

nne

uron

Dw

eigh

t

WC

iP

WD

iP

�W

Ai…

+…

Di

P%

PW

Ai

PW

Bi

P

3.25

0.18

3−

2.71

0.09

47.

430.

087

7.81

18.2

71

0.03

10.

003

0.27

5−

0.92

Are

a(h

a)M

ax.

dept

h(m

)0.

475

6.23

0.05

1−

2.58

0.12

36.

810.

002

12.4

90.

511

3.76

0.06

32

−0.

60−

1.76

0.15

60.

800.

359

2.54

0.30

13

5.94

Vol

ume

(m3)

0.88

52.

580.

110

0.92

0.23

64.

950.

053

3.45

0.08

86.

040.

021

0.43

211

.02

40.

585

0.28

0.10

2−

2.64

Sh.

per.

(km

)−

2.55

0.32

2−

7.16

0.01

2−

1.17

0.21

75

17.9

4E

leva

tion

(m)

0.08

48.

100.

024

0.44

0.45

80.

710.

334

−0.

490.

519

−0.

440.

433

0.13

810

.54

60.

577

3.88

0.04

3−

4.54

TD

S(m

g/l)

−0.

670.

268

1.31

0.33

1−

2.39

0.17

72.

400.

067

8.34

0.80

87

pH4.

150.

049

−7.

430.

005

−1.

450.

361

−2.

030.

114

0.11

715

.46

2.49

0.19

48

GD

D4.

360.

098

Wij

repr

esen

tsth

ein

put-

hidd

en-o

utpu

tco

nnec

tion

wei

ght

for

hidd

enne

uron

i(w

here

i=A

–D)

and

inpu

tva

riab

lej

(whe

rej

=1–

8).

Pva

lues

for

inpu

t-hi

dden

-out

put

conn

ecti

onw

eigh

ts(W

Ai,

WB

i,W

Ci,

and

WD

i),

over

all

conn

ecti

onw

eigh

ts(�

WA

i…+

…D

i),

and

Gar

son’

sre

lati

veim

port

ance

(%)

are

base

don

999

rand

omiz

atio

ns.

Ital

ics

indi

cate

stat

isti

cal

sign

ifica

nce

base

don

�=

0.05

.A

bbre

viat

ions

Sh.

per.

,T

DS

and

GD

Dre

fer

tosh

orel

ine

peri

met

er,

tota

ldi

ssol

ved

solid

s,an

dgr

owin

g-de

gree

days

,re

spec

tive

ly.

J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150 145

network connection weights. Garson’s algorithmuses absolute connection weights to calculate theinfluence of each input variable on the response(see Box 1), whereas overall connection weight iscalculated using the original values. ExaminingFig. 5 (�=0.05), we can show that Garson’salgorithm can be potentially misleading for theinterpretation of input variable contributions. It isevident that lake elevation shows a strong, posi-tive association with species richness through hid-den neuron A, but a strong, negative relationshipwith richness through hidden neuron D. Based onabsolute weights, Garson’s algorithm indicates alarge relative importance of that variable becauseboth connection weights have large magnitudes(e.g. for all hidden neurons RI=17.94%: Table1). However, in such a case the influence of theinput variable on the response is actually negligi-ble since the positive influence through hiddenneuron A is counteracted by the negative influ-ence through hidden neuron D (e.g. for all hiddenneurons �WA1…D1= −1.17, P=0.217: Table 1).For this reason, we believe caution should beemployed when making inferences from the re-sults generated by Garson’s algorithm since thedirection of the input–output interaction is nottaken into account.

Using results of the randomization test, weremoved non-significant connection weights fromthe NID (originally shown in Fig. 1), resulting inFig. 5 which illustrates only connection weightsthat were statistically significantly different fromrandom at �=0.05 and �=0.10. Focusing onhidden neuron C in Fig. 5 (�=0.10), it is appar-ent that as maximum depth and shoreline perime-ter increase, and growing-degree days decreases,species richness increases in the study lakes. Fur-thermore, interactions among habitat factors canbe identified as input variables with contrastingconnection weights (i.e. opposite directions) enter-ing the same hidden neuron. For example, inexamining hidden neuron D it is evident that lakeshoreline perimeter interacts with lake elevation.An increase in lake elevation decreases predictedspecies richness; however, this negative effectweakens as shoreline perimeter increases. There-fore, there is an interaction between lake elevationand shoreline perimeter in that high elevationlakes with convoluted shorelines have greater spe-

Fig. 4. Distributions of random input-hidden-output connec-tion weights for hidden neuron B, overall connection weight,and input relative importance (%) for the influence of surfacearea on lake species richness. Arrows represent observed in-put-hidden-output connection weight for hidden neuron B(7.81), overall connection weight (7.43) and relative impor-tance (18.27%).

J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150146

Fig. 5. NID after non-significant input-hidden-output connection weights are eliminated using the randomization test (i.e.connection weights statistically different from zero based on �=0.05 and �=0.10). The thickness of the lines joining neurons isproportional to the magnitude of the connection weight, and the shade of the line indicates the direction of the interaction betweenneurons: black connections are positive (excitator) and gray connections are negative (inhibitor). Black input neurons indicatehabitat variables that have an overall positive influence on species richness, and gray input neurons indicate an overall negativeinfluence on species richness (based on overall connection weights).

J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150 147

Box 1. Garson’s algorithm for partitioning and quantifying neural network connection weights. Sample calculations shown for threeinput neurons (1, 2 and 3), two hidden neurons (A and B), and one output neuron (�).

cies richness compared to high elevation lakeswith simple shorelines. The NID also identifiesinput variables that do not interact, for examplelake volume, because this variable does not ex-hibit significant weights with contrasting effects atany single hidden neuron with any of the othervariables.

The randomization test can also be used as avariable selection method for removing input and

hidden neurons for which incoming or outgoingconnection weights are not significantly differentfrom random. For example, no significant weightsoriginate from maximum depth, volume andshoreline perimeter input neurons in Fig. 5 (�=0.05), indicating that these variables do not con-tribute significantly to predicted values of speciesrichness. These neurons (i.e. predictor variables)could be removed from the analysis with little loss

J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150148

of predictive power. Similarly, hidden neuronslacking connections to significant weights couldalso be removed (this does not occur in ourempirical example, but see Olden and Jackson,2001). In summary, a randomization approach toneural networks can aid greatly in the identifica-tion and interpretation of direct and indirect (i.e.interaction between input variables) contributionsof input variables in ANNs. We refer the readerto Olden (2000), Olden and Jackson (2001) foradditional studies using the randomization test.

Two important components of the randomiza-tion test involved the optimization of the neuralnetwork. First, we conducted the randomizationtest for the product of input-hidden and hidden-output weights rather than each input-hidden andhidden-output connection weight separately, be-cause the direction of the connection weights (i.e.positive or negative) can switch between differentnetworks optimized with the same data (i.e. sym-metric interchanges of weights: Ripley, 1994). Forinstance, the input-hidden and hidden-outputweights might both be positive in one networkand both negative in another, but in both casesthe input variable exerts a positive influence onthe response variable. To remove this problem, weexamined the product of the input-hidden-outputweights because the true direction of the relation-ship between the input and output will be con-served. Second, optimizing the neural networkseveral times (i.e. constructing a number of net-works with the same data but different initialrandom weights) can result in neural networkswith identical predictive performance, but quitedifferent connection weights. Therefore, if differ-ent initial random weights are used for each ran-domization, dissimilarities between the observedand random connection weights cannot be differ-entiated from the differences arising solely fromdifferent initial connection weights. Using thesame initial random weights for each randomizednetwork alleviates this problem.

5. Conclusion

We reiterate the concern raised by a number ofecologists and explicit to our paper: are ANNs a

black box approach for modeling ecological phe-nomena? In light of the synthesis provided here,we argue the answer is unequivocally no. We havereviewed a series of methods, ranging from quali-tative (i.e. NIDs) to quantitative (i.e. Garson’salgorithm and sensitivity analysis), for interpret-ing neural-network connection weights, and havedemonstrated the utility of these methods forshedding light on the inner workings of neuralnetworks. These methods provide a means forpartitioning and interpreting the contribution ofinput variables in the neural network modelingprocess. In addition, we described a randomiza-tion procedure for testing the statistical signifi-cance of these contributions in terms of individualconnection weights and overall influence of eachinput variable. The former case facilitates theinterpretation of direct and interacting effects ofinput variables on the response by removing con-nection weights that do not contribute signifi-cantly to the performance of the neural network.In the latter case, the randomization test assesseswhether the contribution of a particular inputvariable on the response differs from what wouldbe expected by chance. The randomization proce-dure enables the removal of null neural pathwaysand non-significant input variables; thereby aidingin the interpretation of the neural network byreducing its complexity. In conclusion, by cou-pling the explanatory insight of neural networkswith its powerful predictive abilities, ANNs havegreat promise in ecology, as a tool to evaluate,understand, and predict ecological phenomena.

Acknowledgements

We thank Sovan Lek for his insightful com-ments regarding the finer points of artificial neuralnetworks, and for providing some of the originalMatLab code for this study. This manuscript wasgreatly improved by the comments of NickCollins, Brian Shuter, Keith Somers and ananonymous reviewer. Funding for this researchwas provided by a Graduate Scholarship from theNatural Sciences and Engineering ResearchCouncil of Canada (NSERC) to J.D. Olden, andan NSERC Research Grant to D.A. Jackson.

J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150 149

References

Anderson, J.A., 1995. An Introduction to Neural Networks.MIT, Cambridge, MA, 650 pp.

Aoki, I., Komatsu, T., 1999. Analysis and prediction of thefluctuation of sardine abundance using a neural network.Oceanol. Acta 20, 81–88.

Aurelle, D., Lek, S., Giraudel, J.-L., Berrebi, P., 1999. Mi-crosatellites and artificial neural networks: tools for thediscrimination between natural and hatchery brown trout(Salmo trutta, L.) in Atlantic populations. Ecol. Model.120, 313–324.

Bastarache, D., El-Jabi, N., Turkkan, N., Clair, T.A., 1997.Predicting conductivity and acidity for small streams usingneural networks. Can. J. Civ. Eng. 24, 1030–1039.

Bishop, C.M., 1995. Neural Networks for Pattern Recogni-tion. Clarendon Press, Oxford.

Brosse, S., Guegan, J.F., Tourenq, J.N., Lek, S., 1999. The useof neural networks to assess fish abundance and spatialoccupancy in the littoral zone of a mesotrophic lake. Ecol.Model. 120, 299–311.

Brosse, S., Lek, S., Townsend, C.R., 2001. Abundance, diver-sity, and structure of freshwater invertebrates and fishcommunities: an artificial neural network approach. NewZealand J. Mar. Fresh. Res. 35, 135–145.

Chen, D.G., Ware, D.M., 1999. A neural network model forforecasting fish stock recruitment. Can. J. Fish. Aquat. Sci.56, 2385–2396.

Cheng, B., Titterington, D.M., 1994. Neural networks: a re-view from a statistical perspective (with discussion). Stat.Sci. 9, 2–54.

Colasanti, R.L., 1991. Discussions of the possible use of neuralnetwork algorithms in ecological modeling. Binary 3, 13–15.

Dimopoulos, Y., Bourret, P., Lek, S., 1995. Use of somesensitivity criteria for choosing networks with good gener-alization. Neural Process. Lett. 2, 1–4.

Dimopoulos, I., Chronopoulos, J., Chronopoulos-Sereli, A.,Lek, S., 1999. Neural network models to study relation-ships between lead concentration in grasses and permanenturban descriptors in Athens city (Greece). Ecol. Model.120, 157–165.

Edwards, M., Morse, D.R., 1995. The potential for computer-aided identification in biodiversity research. Trend Ecol.Evol. 10, 153–158.

Garson, G.D., 1991. Interpreting neural-network connectionweights. Artif. Intell. Expert 6, 47–51.

Gelfand, S.B., Mitter, S.K., 1991. Recursive stochastic al-gorithms for global optimization in Rd. SIAM J. ControlOptimization 29, 999–1018.

Goh, A.T.C., 1995. Back-propagation neural networks formodeling complex systems. Artif. Intell. Eng. 9, 143–151.

Gozlan, R.E., Mastrorillo, S., Copp, G.H., Lek, S., 1999.Predicting the structure and diversity of young-of-the-yearfish assemblages in large rivers. Fresh. Biol. 41, 809–820.

Guegan, J.F., Lek, S., Oberdorff, T., 1998. Energy availabilityand habitat heterogeneity predict global riverine fish diver-

sity. Nature 391, 382–384.Hagan, M.T., Demuth, H.B., Beale, M.H., 1996. Neural Net-

work Design. PWS Publishing, Boston, MA.Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer

feedforward networks are universal approximators. NeuralNetworks 2, 359–366.

Lae, R., Lek, S., Moreau, J., 1999. Predicting fish yield ofAfrican lakes using neural networks. Ecol. Model. 120,325–335.

Lek, S., Beland, A., Dimopoulos, I., Lauga, J., Moreau, J.,1995. Improved estimation, using neural networks, of thefood consumption of fish populations. Mar. FreshwaterRes. 46, 1229–1236.

Lek, S., Belaud, A., Baran, P., Dimopoulos, I., Delacoste, M.,1996b. Role of some environmental variables in troutabundance models using neural networks. Aquat. LivingResour. 9, 23–29.

Lek, S., Delacoste, M., Baran, P., Dimopoulos, I., Lauga, J.,Aulagnier, S., 1996a. Application of neural networks tomodeling nonlinear relationships in ecology. Ecol. Model.90, 39–52.

Lek, S., Guegan, J.F., 1999. Artificial neural networks as atool in ecological modeling, an introduction. Ecol. Model.120, 65–73.

Lek, S., Guegan, J.F. (Ed.), 2000. Artificial Neuronal Net-works, Applications to Ecology and Evolution. Springer-Verlag, New York, USA.

Lek-Ang, S., Deharveng, L., Lek, S., 1999. Predictive modelsof collembolan diversity and abundance in a riparian habi-tat. Ecol. Model. 120, 247–260.

Manel, S., Dias, J.-M., Ormerod, S.J., 1999. Comparing dis-criminant analysis, neural networks and logistic regressionfor predicting species distributions: a case study with aHimalayan river bird. Ecol. Model. 120, 337–347.

Mastrorillo, S., Lek, S., Dauba, F., 1997a. Predicting theabundance of minnow Phoxinus phoxinus (Cyprinidae) inthe River Ariege (France) using artificial neural networks.Aquat. Living Resour. 10, 169–176.

Mastrorillo, S., Lek, S., Dauba, F., Beland, A., 1997b. The useof artificial neural networks to predict the presence ofsmall-bodied fish in a river. Fresh. Biol. 38, 237–246.

Mastrorillo, S., Dauba, F., Oberdorff, T., Guegan, J.F., Lek,S., 1998. Predicting local fish species richness in theGaronne River basin. C.R. Acad. Sci. Paris, Sciences de lavie 321, 423–428.

Minns, C.K., 1989. Factors affecting fish species richness inOntario lakes. Trans. Am. Fish. Soc. 118, 533–545.

Olden, J.D., 2000. An artificial neural network approach forstudying phytoplankton succession. Hydrobiology 436,131–143.

Olden, J.D., Jackson, D.A., 2000. Torturing data for the sakeof generality: how valid are our regression models? Eco-science 7, 501–510.

Olden, J.D., Jackson, D.A., 2001. Fish-habitat relationships inlakes: gaining predictive and explanatory insight by usingartificial neural networks. Trans. Am. Fish. Soc. 130,878–897.

J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150150

O� zesmi, S.L., O� zesmi, U., 1999. An artificial neural networkapproach to spatial habitat modeling with interspecificinteraction. Ecol. Model. 116, 15–31.

Paruelo, J.M., Tomasel, F., 1997. Prediction of functionalcharacteristics of ecosystems: a comparison of artificialneural networks and regression models. Ecol. Model. 98,173–186.

Perrone, M.P., Cooper, L.N., 1993. When networks disagree:Ensemble methods for hybrid neural networks. In: Mam-mone, R.J. (Ed.), Artificial Neural Networks for Speechand Vision. Chapman and Hall, London, pp. 126–147.

Ripley, B.D., 1994. Neural networks and related methods forclassification. J. R. Stat. Soc. B 56, 409–456.

Ripley, B.D., 1995. Statistical ideas for selecting networkarchitectures. In: Kappen, B., Gielen, S. (Eds.), NeuralNetworks: Artificial Intelligence and Industrial Applica-tions. Springer, London, pp. 183–190.

Ripley, B.D., 1996. Pattern Recognition and Neural Net-works. Cambridge University Press.

Rumelhart, D.E., Hinton, G.E., Williams, R.J., 1986. Learn-ing representations by back-propagation errors. Nature

323, 533–536.Scardi, M., 2001. Advances in neural network modeling of

phytoplankton primary production. Ecol. Model. 146, 33–46.

Scardi, M., Harding, L.W., 1999. Developing an empiricalmodel of phytoplankton primary production: a neuralnetwork case study. Ecol. Model. 120, 213–223.

Schleiter, I.M., Borchardt, D., Wagner, R., Dapper, T.,Schmidt, K.-D., Schmidt, H.-H., Werner, H., 1999. Model-ing water quality, bioindication and population dynamicsin lotic ecosystems using neural networks. Ecol. Model.120, 271–286.

Smith, M., 1994. Neural Networks for Statistical Modeling.Van Nostrand Reinhold, New York, USA.

Spitz, F., Lek, S., 1999. Environmental impact predictionusing neural network modeling. An example in wildlifedamage. J. Appl. Ecol. 36, 317–326.

White, H., 1989. Learning in artificial neural networks: astatistical perspective. Neural Computing 1, 425–464.

Wolpert, D.H., 1992. Stacked generalization. Neural Net-works 5, 241–259.