5
RESEARCH ARTICLE Variable structure dynamic artificial neural networks Catherine D. Schuman * , J. Douglas Birdwell U. Tennessee, Dept. of EECS, Knoxville, TN 37996-2250, USA Received 14 March 2013; received in revised form 7 May 2013; accepted 15 May 2013 KEYWORDS Discrete-event systems; Neural networks; Evolutionary algorithms; Cyber-physical systems; Cyber security Abstract We introduce a discrete-event artificial neural network structure inspired by biological neural networks. It includes dynamic components and has variable structure. The network’s topology and its dynamic components are modifiable and trainable for different applications. Such adap- tation in the network’s parameters, structure, and dynamic components makes it easier to adapt to varying behaviors due to the problem’s structure than other types of networks. We demonstrate that this type of network structure can detect random changes in packet arrival rates in computer network traffic with possible applications in cyber security. ª 2013 Elsevier B.V. All rights reserved. 1. Introduction Since their introduction in McCulloch and Pitts (1943), arti- ficial neural networks (ANNs) have diverged from their bio- logical counterparts. The topologies of most traditional ANNs (the neurons and how they are connected) and the dy- namic components of the system (such as integration ele- ments) are fixed before training occurs. In this class of structures, training affects only the values of the connec- tion weights. Training algorithms for these ANNs include back-propagation and evolutionary algorithms (Yao, 1999). Deep learning architectures also fall into this category, though those are often trained using a combination of unsu- pervised and supervised learning (Bengio, 2009). A second class of ANNs is variable-structure but other- wise static, in the sense that the dynamic components are fixed prior to training. Both the topology and the parame- ters of the topology (weights of synapses) are trained. These networks are typically trained using either evolutionary algorithms (Stanley & Miikkulainen, 2002; Floreano, Drr, & Mattiussi, 2008; Siebel, Botel, & Sommer, 2009), or a com- bination of evolutionary algorithms and back-propagation (White & Ligomenides, 1993; Liu & Yao, 1996; Alba & Chi- cano, 2004). Biological neural networks incorporate dynamic behav- iors in their neurons. Our goal is to move ANNs more towards their biological counterparts by incorporating similar dy- namic characteristics. We define a third class of ANN struc- ture: variable structure dynamic ANNs, where the dynamic components are part of the network and not placed a priori. 2212-683X/$ - see front matter ª 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.bica.2013.05.001 * Corresponding author: Address: U. Tennessee, Dept. of EECS, Laboratory for Information Technologies, Knoxville, TN 37996-2250, USA. Tel.: +1 865 974 9187. E-mail address: [email protected] (C.D. Schuman). Biologically Inspired Cognitive Architectures (2013) 6, 126130 Available at www.sciencedirect.com journal homepage: www.elsevier.com/locate/bica

Variable structure dynamic artificial neural networks

Embed Size (px)

Citation preview

Biologically Inspired Cognitive Architectures (2013) 6, 126–130

Avai lab le at www.sc iencedi rect .com

journa l homepage: www.elsevier .com/ locate /b ica

RESEARCH ARTICLE

Variable structure dynamic artificialneural networks

2212-683X/$ - see front matter ª 2013 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.bica.2013.05.001

*Corresponding author: Address: U. Tennessee, Dept. of EECS,Laboratory for Information Technologies, Knoxville, TN 37996-2250,USA. Tel.: +1 865 974 9187.

E-mail address: [email protected] (C.D. Schuman).

Catherine D. Schuman*, J. Douglas Birdwell

U. Tennessee, Dept. of EECS, Knoxville, TN 37996-2250, USA

Received 14 March 2013; received in revised form 7 May 2013; accepted 15 May 2013

KEYWORDSDiscrete-event systems;Neural networks;Evolutionary algorithms;Cyber-physical systems;Cyber security

Abstract

We introduce a discrete-event artificial neural network structure inspired by biological neuralnetworks. It includes dynamic components and has variable structure. The network’s topologyand its dynamic components are modifiable and trainable for different applications. Such adap-tation in the network’s parameters, structure, and dynamic components makes it easier toadapt to varying behaviors due to the problem’s structure than other types of networks. Wedemonstrate that this type of network structure can detect random changes in packet arrivalrates in computer network traffic with possible applications in cyber security.ª 2013 Elsevier B.V. All rights reserved.

1. Introduction

Since their introduction in McCulloch and Pitts (1943), arti-ficial neural networks (ANNs) have diverged from their bio-logical counterparts. The topologies of most traditionalANNs (the neurons and how they are connected) and the dy-namic components of the system (such as integration ele-ments) are fixed before training occurs. In this class ofstructures, training affects only the values of the connec-tion weights. Training algorithms for these ANNs includeback-propagation and evolutionary algorithms (Yao, 1999).Deep learning architectures also fall into this category,

though those are often trained using a combination of unsu-pervised and supervised learning (Bengio, 2009).

A second class of ANNs is variable-structure but other-wise static, in the sense that the dynamic components arefixed prior to training. Both the topology and the parame-ters of the topology (weights of synapses) are trained. Thesenetworks are typically trained using either evolutionaryalgorithms (Stanley & Miikkulainen, 2002; Floreano, Drr, &Mattiussi, 2008; Siebel, Botel, & Sommer, 2009), or a com-bination of evolutionary algorithms and back-propagation(White & Ligomenides, 1993; Liu & Yao, 1996; Alba & Chi-cano, 2004).

Biological neural networks incorporate dynamic behav-iors in their neurons. Our goal is to move ANNs more towardstheir biological counterparts by incorporating similar dy-namic characteristics. We define a third class of ANN struc-ture: variable structure dynamic ANNs, where the dynamiccomponents are part of the network and not placed a priori.

Variable structure dynamic artificial neural networks 127

Dynamic networks without variable structure have beenimplemented in hardware to mimic the brain’s architecture,but they are fixed-structure dynamic ANNs (Merolla et al.,2011; Rajendran et al., 2013). Our training algorithm (anevolutionary algorithm) for this class of networks deter-mines the topology of the network, the parameters of thattopology, and the dynamics of the network needed to solvea specified problem. We believe this structure is attractivebecause it allows an integrated learning approach that opti-mizes simultaneously over both structure and parameters.Compared to other approaches, it also eliminates the needto pre-define large numbers of pathways in the network thatmay not be used. The structure is driven by discrete events,and all information flows within the network are event–based, more accurately replicating the brain’s neural func-tions. The structure can learn to solve complex problemsand offers a natural way to design discrete-event systems.

2. Technical approach

Our proposed model of a neuron is based on the Hodgkin–Huxley model (Trappenberg, 2010). This model is also simi-lar to the work on spiking neurons (Maass, 1997; Izhikevich,2003; Izhikevich, 2004). However, our neurons are used in acomputational neural network that is trained to complete atask, rather than attempting to simulate biological behav-ior. Each neuron in our network structure has an associatedfiring threshold and refractory period, and is located at afixed point in a bounded three-dimensional region. Chargeaccumulates at a neuron until the threshold is reached; thenthe neuron fires. The neuron enters a refractory period afterit fires, in which it can accumulate charge but cannot fire.Neurons can be input neurons, output neurons, or hiddenneurons. Input neurons receive information from outsidethe network, and output neurons deliver information toexternal processes.

Synapses are directed from a source neuron to a destina-tion neuron and transfer charge. The length of the synapsedetermines the elapsed time from the firing of a source neu-ron to the arrival of charge at a destination neuron. A synapseweight determines the amount of charge received by the des-tination neuron. The weights of the synapses are increasedduring operation by long term potentiation (LTP) and de-creased by long term depression (LTD), which are forms ofHebbian learning that occur in the brain (Dayan & Abbott,2001). Synapses have LTP and LTD refractory periods that re-strict how often weights can be adjusted. Activity in the net-work is simulated over a defined time period. During this timeperiod, charges are applied to the network at the input neu-rons, and the activities that result propagate through the net-work. The event propagation velocity between neurons is oneunit of distance per unit time.

The training algorithm used for these networks is evolu-tionary. A population of networks is maintained, and a fit-ness function specific to the application is applied to eachnetwork in the population. Networks with higher fitnessare preferentially selected for reproduction. With eachselection, two networks with relatively high fitness are se-lected, and crossover and mutation operations are chosenwith some probability. Both crossover and mutation opera-tions affect not only the weights of the synapses, but the

number of neurons and the number of synapses. The struc-ture needed to solve a problem can be evolved with this ap-proach. No information about the complexity of thenetwork is assumed a priori.

In our evolutionary algorithm, the networks are repre-sented directly. A random mutation is chosen from a list ofpossible mutations, where different mutations are selectedwith varying likelihoods. A single mutation to the network isone of the following, listed from highest to lowest likelihoodof occurrence: a change in sign of a randomly selected (RS)synapse, a change in weight value of a RS synapse, a new syn-apse between two RS neurons, the deletion of a RS synapse, anew neuron (as well as connections to and from that neuron toother neurons in the network), the deletion of a RS hiddenneuron, and a change in threshold of a RS neuron.

There are many potential crossover operations for thesetypes of networks. We have chosen a crossover method wecall ‘‘plane crossover’’. Other crossover operations can beused. With the plane crossover operation, two parent net-works are selected. A random plane through the cube thatcontains each network is chosen. This plane splits both par-ent networks into two parts. One part from each parent net-work is given to each child. All of the synaptic connectionswithin a part are maintained in the children. Synaptic con-nections across parts in the children are more difficult, aseither neuron may not exist in the child network. All synap-tic connections from a neuron in the parent network thatconnect to a neuron on the other side of the plane arerecreated in the child network by creating a synaptic con-nection from the neuron in the child network to the neuronthat is spatially closest to the location of the original neuronin the parent network.

3. Experiments and results

The problem of interest relates to an application in cybersecurity, which is the observation of packet arrivals at anode in a packet–switched communication network anddetection of changes in the statistics of the arrival rate.These statistics are typically not known a priori and wouldrequire estimation in a conventional detection scheme andassumptions about the underlying process that governs thestatistics. It is realistic to assume the statistics are piece-wise constant but unknown, with a distribution on a finiteinterval of real-valued arrival rates, with jump discontinu-ities. The distribution of time intervals between jumps istypically not known. This corresponds to a mix of softwareapplications that each generate network traffic at a moreor less constant rate that is destined for a monitored net-work address. Some applications may be malware, in whichcase an objective is detection of the start and end of packetstreams produced by the malware.

This scenario can be modeled as an observation of a dis-crete–event process that can be characterized by its arrivaltimes {tkŒt0 = 0, tk+1 > tk, k 2 I+}. This process can be repre-sented as a discrete time real–valued random process{xk = tk � tk�1Œk 2 I+ � {0}}, where the xk are in Rþ andxk „ 0. The xk are the time intervals between event arrivalsand, with the additional knowledge of the time of the firstevent, t0, fully characterize the discrete–event process. Awell–known statistical detection problem assumes that xk

128 C.D. Schuman, J.D. Birdwell

is a random process sampled from one of two known distri-butions, characterized by probability spaces (Xi,Ki,Pi), fori = 0, 1. Optimal detectors are known that minimize a linearcombination of (i) the probability of detection p0, (ii) thefalse alarm probability p1, and (iii) the expected time ofdetection (decision) E{T}. The optimal algorithm processesreceived events sequentially, and after each receipt decides(i) not to make a decision until additional information is re-ceived, or (ii) that the inputs correspond to process 0 or 1.In the second case, the algorithm outputs the determinedprocess type and stops. A slightly more challenging problemassumes that xk is a random process whose statistics canchange from sample time to sample time between the twoprobability spaces. In both cases, the problem is well-de-fined in the field and has an optimal solution when theparameters of both distributions are completely specified(Poor & Hadjiliadis, 2009). There are also algorithms for thisproblem when the second distribution has some unknownparameters (Li, Dai, & Li, 2009).

In our setup, the network to be designed has one inputnode and one output node. The network receives a pulse eachtime a packet arrives. Firing of the output node correspondsto a change in behavior. We allow for a window of 100 timesteps after the change in behavior. We also define a thresholdvalue, s, that determines how many output firings constitute adetection. If the mean arrival rate changes at time t, then sfirings of the output neuron at any point between t andt + 100 is considered a true positive. If the output node firess times in a 100 time step window at any other time than100 time steps following a change in mean arrival rate, it iscategorized as a false alarm. For training, s = 1 is used, andthe fitness of the network is a function of the number of cor-rect detections and false alarms.

We ran our training algorithm for 10000 epochs. The re-sults shown below were produced by two networks, N+ (65neurons and 187 synapses) and N� (47 neurons and 148 syn-apses), that were able to detect, respectively, increases (+)and decreases (–) in the mean arrival rate. We ran threetypes of tests: tests with large changes in the mean arrivalrate (changes of at least 0.38), medium changes in the meanarrival rate (changes of at least 0.2) and small changes inthe mean arrival rate (changes of at least 0.1). We esti-mated the probability of detection (Pd) of increases (de-creases), the probability of false alarms (Pfa) for increases(decreases), and the probability of missed detection (Pm)of increases (decreases) by the frequency of detection andmissed detection events over 100 test runs for both N+

and N�. All runs (training and evaluation) utilized indepen-dently generated random input event sequences. The re-sults are shown in Table 1. Fig. 1 shows the changes inmean arrival rate of one example test run, as well as whenN+ and N� fired in that test run.

Our networks performed well when detecting largechanges in the mean arrival rate, but the performance de-creased as the size of the change in the arrival rate de-creased. This is expected and consistent with the behaviorof optimal detectors for problems where they are known.Performance decreases as the region of overlap increasesbetween the probability functions of the event observablesconditioned upon the hypotheses. A simple fitness functionwas used: the difference between the numbers of correctdetections and false alarms. The fitness function favored

networks that could detect any change in mean activityrate, but all of our training examples produced networksthat detected either positive or negative changes, but notboth. The results are preliminary; other fitness functionsor more extensive training will produce better results. Thenetworks that were produced had many recurrent connec-tions, as would be expected for this type of problem.

The performance characteristics of the ANN can be com-pared against the performance of an optimal detector for asimplified problem where the solution is known. We con-sider a classic example where the input event process hasa constant mean arrival rate k that is one of two values{k0,k1} and is observed over a time interval DT. We assumek1 > k0. The optimal probabilities of detection and error,suitably defined, can be computed without regard to the ob-served sequence of events in this case.

The number of received events n is a Poisson random var-iable with distribution

pðnÞ ¼ ðkDTÞn

n!e�kDT ð1Þ

where kDT is the mean number of observed events in thetime interval. The problem is to decide which of twohypotheses is correct: {H0:k = k0} or {H1:k = k1}. Assumingthe a priori probability of hypothesis H0 is 0.5 and the costsassigned to correct (detection) and incorrect (false alarm orfailure to detect) identification of the true hypothesis areequal, the optimal decision rule is (Van Trees, 1968), givenan observed number of events n in the time interval, anddefining the function

fðk0; k1Þ ¼ðk1 � k0ÞDTln k1 � ln k0

; ð2Þ

3pt h ¼H1 if n > fðk0; k1ÞH0 if n < fðk0; k1Þ

�ð3Þ

with no (or random) choice in the case of equality. Theprobability of detection (correct classification) is the sumof the probabilities that H0 is true and the number of ob-served events is less than f(k0,k1), and that H1 is true andthe number of observed events in greater than f(k0,k1)(assuming the function’s value is not an integer):

pd ¼X

k P 0

k < fðk0; k1Þ

ðk0DTÞn

n!e�k0DT

þX

k>fðk0 ;k1Þ

ðk1DTÞn

n!e�k1DT ð4Þ

The probability of error is expressed similarly:

pe ¼X

k P 0

k < fðk0; k1Þ

ðk1DTÞn

n!e�k1DT

þX

k>fðk0 ;k1Þ

ðk0DTÞn

n!e�k0DT ð5Þ

This classic detector is predicated upon the assumptionsthat either hypothesis H0 or H1 is valid for the duration ofthe time interval, that a priori statistics are known, and

Table 1 Estimated probability of detection (Pd), probability of false alarm (Pfa), and probability of missed detection (Pm) ofincreases and decreases for N+ and N�, respectively, for three different test types (large, medium, and small changes in meanarrival rate).

Large Medium Small

Network Pd Pfa Pm Pd Pfa Pm Pd Pfa Pm

N+ 0.90 0.05 0.02 0.82 0.18 0.08 0.78 0.22 0.35N� 0.87 0.13 0.01 0.87 0.13 0.14 0.78 0.22 0.45

(+)

(-)

(IN)

Zoomed-In Region Showing Events

Even

t Seq

uenc

es: I

nput

(IN

/ Zo

omed

-In V

iew

Onl

y)D

etec

ted

Incr

ease

s (+

) and

Dec

reas

es (-

)

Events DetectingRate Decreases Events Detecting

Rate Increases

(FA)

Fig. 1 Simulation results with each detection network. The mean arrival rate of the test run is shown as the blue line. The outputfiring events for N+ and N� are shown in red (+) and green (–). The events are discrete, and only their times are meaningful (seezoomed – in region showing both inputs and outputs); their vertical placement is for convenience. Example false alarm events (FA)are indicated.

Variable structure dynamic artificial neural networks 129

that costs can be assigned. When one of the hypotheses isnot valid for the entire interval, as is the case for the appli-cation of interest, the mathematics become more challeng-ing. One approach is the assumption of a Markov processthat generates the arrival statistic as a function of time,and the methods of quickest detection, discussed previ-ously, can be applied in some cases.

An alternative approach using Neyman–Pearson detec-tors (Van Trees, 1968), which compare a computed likeli-hood ratio against a threshold, is used here to explorehow the probability of detection changes with a constraint

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.450

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ROCs for λ0 = 0.35 − 0.45 by 0.05 and λ1 = 1 − λ0, with Δ t = 100

Probability of Error

Prob

abilit

y of

Det

ectio

n

0.35 & 0.65

0.40 & 0.60

0.45 & 0.55

Fig. 2 ROC curves for Neyman–Pearson optimal detectors (left) athe optimal detectors assume equal a priori hypotheses and arrivingarrival rates, while the ANN detector was trained to detect anyassumptions about a priori statistics.

on the maximum allowed probability of error, expressedgraphically as receiver operating characteristic (ROC)curves. If the probabilities of observation of a signal S givenhypotheses H0 and H1, p(SŒH0) and p(SŒH1), are known, thelikelihood ratio (LR)

KðSÞ ¼ pðSjH1ÞpðSjH0Þ

ð6Þ

can be compared against a threshold g determined by thesolution of a constrained optimization problem, yielding adecision that H1 is true if the LR K(S) > g and that H0 is true

ROCs for λ0 = 0.20 − 0.40 by 0.10 and λ1 = 1 − λ0, with Δ t = 100

0.30 & 0.70

0.20 & 0.80

0.40 & 0.60

nd an ANN that detects increases in mean arrival rate. Note thatevents are from one of two processes having the indicated meanincrease in arrival rate within a specified range and made no

130 C.D. Schuman, J.D. Birdwell

if it is less. Fig. 2 shows representative ROC curves for Ney-man–Pearson detectors and an ANN detector for differentvalues of k0 and k1. The Neyman–Pearson optimal detector,which is a function of the maximum allowed probability oferror, is used to generate the curves on the left for eachpair of mean arrival rates. In contrast, the same ANN detec-tor structure is used for all pairs on the right. In order toevaluate the ANN detector in a like manner, the detector’soutput events within intervals [t � Dt,t] are counted andcompared against a threshold. A detection at time t corre-sponds to the count exceeding the threshold at that time,and frequencies of detection and error are computed for arange of thresholds and graphs. The salient point here isthe ANN detector is providing a (probably suboptimal) solu-tion to a much more challenging detection problem than canbe solved mathematically, where the statistics of the under-lying processes are not known and must be learned (alongwith the solution) by observing the input event sequence.The learning problem is supervised, as an oracle is assumedthat allows evaluation of the fitness function during train-ing, but this is not sufficient to drive an optimal detector.It is sufficient at this point to recognize that the ANN detec-tor’s performance has similar behavior to an (over-simpli-fied) optimal detector, exhibiting increasing detectionprobability with increasing allowable probability of error,and increasing probability of detection with an increasingdifference in mean arrival rate of the events. The two typesof detectors are qualitatively, but not quantitatively,comparable.

4. Conclusions

We have demonstrated the utility of our network on a prob-lem that to our knowledge does not have a known optimalsolution, and we have compared the behavior of our net-work to that of a class of optimal detectors for a simplifiedproblem. With very little human interaction, outside ofspecifying one input node and one output node, the algo-rithm produced networks that can detect changes in the sta-tistics of the arrival rate of packets in a network securitysystem. Dynamic components are absolutely necessary forthese problems. Our ANN can evolve to include the struc-tural elements and dynamic elements that each separateproblem requires, rather than rely on hand-tuning of thestructure or dynamic components, as is often required forother types of neural networks. We have also found thatour network structure can be trained to perform well onthe exclusive-or problem and the cart and pole (invertedpendulum) problem; these results will be published at a la-ter time.

Acknowledgement

This material is based upon work supported by the NationalScience Foundation Graduate Research Fellowship Programunder Grant No. DGE-0929298. Any opinions, findings, and

conclusions or recommendations expressed in this materialare those of the author(s) and do not necessarily reflectthe views of the National Science Foundation.

References

Alba, E., & Chicano, J. (2004). Training neural networks with GAhybrid algorithms. In K. Deb (Ed.), Genetic and evolutionarycomputation GECCO 2004. Lecture notes in computer science(vol. 3102, pp. 852–863). Berlin/Heidelberg: Springer.

Bengio, Y. (2009). Learning deep architectures for AI. Foundationsand Trends in Machine Learning, 2(1), 1–127.

Dayan, P., & Abbott, L. (2001). Theoretical neuroscience: Compu-tational and mathematical modeling of neural systems. Cam-bridge, Massachusetts: The MIT Press.

Floreano, D., Drr, P., & Mattiussi, C. (2008). Neuroevolution: Fromarchitectures to learning. Evolutionary Intelligence, 1(1), 47–62.

Izhikevich, E. (2003). Simple model of spiking neurons. IEEETransactions on Neural Networks, 14(6), 1569–1572.

Izhikevich, E. (2004). Which model to use for cortical spikingneurons? IEEE Transactions on Neural Networks, 15(5),1063–1070.

Li, C., Dai, H., & Li, H. (2009). Adaptive quickest change detectionwith unknown parameter. In ICASSP 2009. IEEE internationalconference on acoustics, speech and signal processing, 2009 (pp.3241–3244).

Liu, Y., & Yao, X. (1996). A population-based learning algorithmwhich learns both architectures and weights of neural networks.Chinese Journal of Advanced Software Research (Allerton),10011, 54–65.

Maass, W. (1997). Networks of spiking neurons: The third generationof neural network models. Neural Networks, 10(9), 1659–1671.

McCulloch, W., & Pitts, W. (1943). A logical calculus of the ideasimmanent in nervous activity. Bulletin of Mathematical Biology,5, 115–133.

Merolla, P., Arthur, J., Akopyan, F., Imam, N., Manohar, R., &Modha, D. (2011). A digital neurosynaptic core using embeddedcrossbar memory with 45 pj per spike in 45 nm. In 2011 IEEEcustom integrated circuits conference (CICC) (pp. 1–4).

Poor, H. V., & Hadjiliadis, O. (2009). Quickest detection. Cam-bridge: Cambridge University Press.

Rajendran, B., Liu, Y., sun Seo, J., Gopalakrishnan, K., Chang, L.,Friedman, D., et al (2013). Specifications of nanoscale devicesand circuits for neuromorphic computational systems. IEEETransactions on Electron Devices, 60(1), 246–253.

Siebel, N., Botel, J., & Sommer, G. (2009). Efficient neural networkpruning during neuro-evolution. In IJCNN 2009. Internationaljoint conference on neural networks, 2009 (pp. 2920–2927).

Stanley, K. O., & Miikkulainen, R. (2002). Evolving neural networksthrough augmenting topologies. Evolutionary Computation,10(2), 99–127.

Trappenberg, T. P. (2010). Fundamentals of computational neuro-science (second ed.). New York: Oxford University Press.

Van Trees, H. L. (1968). Detection, estimation, and modulationtheory, Part I: Detection, estimation, and linear modulationtheory. New York: Wiley.

White, D., & Ligomenides, P. (1993). GANN et: A genetic algorithmfor optimizing topology and weights in neural network design. InJ. Mira, J. Cabestany, & A. Prieto (Eds.), New trends in neuralcomputation. Lecture notes in computer science (Vol. 686,pp. 322–327). Berlin/Heidelberg: Springer.

Yao, X. (1999). Evolving artificial neural networks. Proceedings ofthe IEEE, 87(9), 1447