[IEEE 2011 IEEE Congress on Evolutionary Computation (CEC) - New Orleans, LA, USA (2011.06.5-2011.06.8)] 2011 IEEE Congress of Evolutionary Computation (CEC) - Hybrid Artificial Bee

Hybrid Artificial Bee Colony Algorithmfor Neural Network Training

Celal Ozturk and Dervis KarabogaComputer Engineering Department,Erciyes University, Kayseri, Turkiye

[email protected], [email protected]

Abstract—A hybrid algorithm combining Artificial Bee Colony(ABC) algorithm with Levenberq-Marquardt (LM) algorithm isintroduced to train artificial neural networks (ANN). Trainingan ANN is an optimization task where the goal is to find optimalweight set of the network in training process. Traditional trainingalgorithms might get stuck in local minima and the global searchtechniques might catch global minima very slow. Therefore, hy-brid models combining global search algorithms and conventionaltechniques are employed to train neural networks. In this work,ABC algorithm is hybridized with the LM algorithm to applytraining neural networks.

Keywords: Neural network training, Levenberq-Marquardtalgorithm, Artificial bee colony algorithm, Hybrid algorithms

I. INTRODUCTION

Artificial Neural Networks (ANNs) provide main character-istics, such as: adaptability, capability of learning by examples,and ability to generalize with applicable to solving problemsin pattern classification, function approximation, optimization,pattern matching and associative memories [1], [2]. Amongmany different neural network models, the multilayer feed-forward neural networks (MLP) have been mainly used dueto their well-known universal approximation capabilities [3].

The success of neural networks largely depends on theirarchitecture, their training algorithm, and the choice of fea-tures used in training. All these make design of artificialneural networks a difficult optimization problem [4]. In manyapproaches, the topology and transfer functions are heldfixed, and the space of possible networks is spanned by allpossible values of the weights and biases [5]. In [6], antcolony optimization; in [7], tabu search and in [8], simulatedannealing and genetic algorithms were used for training ofneural networks with fixed topology. Neural network learningoptimization process result in finding the weights configurationassociated to the minimum output error.

Regarding the MLPs training, the mostly used trainingalgorithms are the back-propagation (BP) algorithm andLevenberq-Marquardt (LM), which are gradient-based meth-ods. While BP algorithm converges in the first order deriva-tives, the LM algorithm converges with second order deriva-tives. The researchers prefer LM among the conventionalmethods because of its convergence speed and performance.On the other hand, derivative based algorithms have a riskof tackling local minima. To deal with this problem, globalsearch techniques, having ability to avoid local minima, arebeing used to adjust weights of MLPs, such as evolutionary

algorithms (EA), simulated annealing (SA), tabu search (TS),ant colony optimization (ACO), particle swarm optimization(PSO) and artificial bee colony (ABC) [6], [9], [10], [11].

Evolutionary algorithms and population-based algorithmsare having consistent performance on training MLPs. More-over, to benefit from advantages of global search and con-ventional techniques, hybrid models have been being recentlyapplied to train neural networks. Genetic algorithm is hy-bridized with local search gradient methods BP and LM in[12]; simulated annealing is combined with local gradientsearch algorithm (Rprop) in [13] and in [14] with tabu search;particle swarm optimization is hybridized with local gradientsearch algorithm in [15] and in [16], with back-propagation;adaptive PSO and BP are used together in [17] for MLPstraining.

Artificial Bee Colony (ABC) algorithm is studied on train-ing neural networks for solving test problems in [18] and forclassification purpose in [11]. ABC algorithm has a strongability to find global optimistic result and LM algorithm has astrong ability to find local optimistic result. The motivation ofthis paper is combining ABC with LM, a new hybrid algorithmreferred to as ABC-LM algorithm, to apply neural networktraining. We described training an artificial neural network inSection 2. ABC algorithm and hybrid version ABC-LM isintroduced in Section 3. In Section 4, experiments and resultsare presented and discussed. The paper is concluded in Section5 by summarizing the observations and remarking the futurework.

II. TRAINING FEED-FORWARD ARTIFICIAL NEURAL

NETWORKS

An ANN consists of a set of processing elements (Fig. 1),also known as neurons or nodes, which are interconnectedwith each other [4]. In feed forward neural network models,shown in Fig. 2, each node receives a signal from the nodesin the previous layer, and each of those signals is multipliedby a separate weight value. The weighted inputs are summed,and passed through a limiting function which scales the outputto a fixed range of values. The output of the limiter is thenbroadcast to all of the nodes in the next layer. The input valuesto the inputs of the first layer, allow the signals to propagatethrough the network, and read the output values where outputof the 𝑖th node can be described by Eq. 1.

84978-1-4244-7835-4/11/$26.00 ©2011 IEEE

𝑥1

𝑥𝑖

𝑥𝑛

𝜔1

𝜔𝑖

𝜔𝑛

𝑦∑𝑓(𝑛𝑒𝑡)

Fig. 1. Processing unit of an ANN (neuron).

𝑦𝑖 = 𝑓𝑖(

𝑛∑

𝑗=1

𝑤𝑖𝑗𝑥𝑗 + 𝑏𝑖) (1)

where 𝑦𝑖 is the output of the node, 𝑥𝑗 is the 𝑗th inputto the node, 𝑤𝑖𝑗 is the connection weight between the nodeand input 𝑥𝑗 , 𝑏𝑖 is the threshold (or bias) of the node, and𝑓𝑖 is the node transfer function. Usually, the node transferfunction is a nonlinear function such as a heaviside function,a sigmoid function, a Gaussian function, etc. In this paper, thelogarithmic sigmoid (Eq. 2) transfer function was employedat hidden and output layer neurons.

𝑦 = 𝑓(𝑛𝑒𝑡) = 11+𝑒−𝑛𝑒𝑡 (2)

The optimization goal is to minimize the objective functionby optimizing the network weights. In Evolutionary Algo-rithms, the major idea underlying this synthesis is to interpretthe weight matrices of the ANNs as individuals, to change theweights. In this paper, the mean square error (𝑀𝑆𝐸), given byEq. 3, is chosen as network error function and the adaptationis carried out by minimizing MSE.

𝐸(��(𝑡)) =1

𝑁

𝑁∑

𝑗=1

𝐾∑

𝑘=1

(𝑑𝑘 − 𝑜𝑘)2 (3)

where, 𝐸(��(𝑡)) is the error at the 𝑡th iteration; ��(𝑡), theweights in the connections at the 𝑡th iteration; 𝑑𝑘 and 𝑜𝑘represent respectively the desired and actual values of 𝑘thoutput node; 𝐾 is the number of output nodes and 𝑁 is thenumber of patterns.

𝑥2

𝑥1

𝑠2

𝑠1

𝑤12

𝑤22

𝑤11

𝑤21 𝑂1

𝑤11

𝑤21

𝑏12

𝑏11

𝑏21

Biases

Fig. 2. Multilayer feed-forward neural network (MLP) model.

III. TRAINING ALGORITHMS

A. Artificial Bee Colony Algorithm

Artificial Bee Colony algorithm, simulating the intelligentforaging behavior of honey bee swarms, was proposed by

Karaboga for optimizing numerical problems in [19]. In ABCalgorithm, the position of a food source represents a possiblesolution to the optimization problem and the nectar amountof a food source corresponds to the quality (fitness) of theassociated solution. The colony of artificial bees contains threegroups of bees: employed bees, onlookers and scouts. A beewaiting on the dance area for making decision to choose a foodsource is called onlooker and one going to the food sourcevisited by it before is named employed bee. The other kind ofbee is scout bee that carries out random search for discoveringnew sources.

Detailed pseudo-code of the ABC algorithm is[20]:

1: Load training samples2: Generate the initial population 𝑥𝑖, 𝑖 = 1 . . . 𝑆𝑁3: Evaluate the fitness (𝑓𝑖) of the population4: set cycle to 15: repeat6: FOR each employed bee{

Produce new solution 𝜐𝑖 by using (5)Calculate the value 𝑓𝑖Apply greedy selection process}

7: Calculate the probability values 𝑝𝑖 for the solutions (𝑥𝑖)by (4)

8: FOR each onlooker bee{Select a solution 𝑥𝑖 depending on 𝑝𝑖Produce new solution 𝜐𝑖Calculate the value 𝑓𝑖Apply greedy selection process}

9: If there is an abandoned solution for the scoutthen replace it with a new solution which will berandomly produced by (6)

10: Memorize the best solution so far11: cycle=cycle+112: until cycle=MCN

where 𝑥𝑖 represents a solution, 𝑓𝑖 is the fitness value of 𝑥𝑖,𝜐𝑖 indicates a neighbor solution of 𝑥𝑖, 𝑝𝑖 is the probabilityvalue of 𝑥𝑖 and 𝑀𝐶𝑁 is the # of maximum cycle in thealgorithm.

In the algorithm, first half of the colony consists of em-ployed artificial bees and the second half constitutes theonlookers. The number of employed bees is equal to thenumber of food sources (# of solutions in the population). Theemployed bee whose food source has been exhausted becomesa scout bee. At the first step, the ABC generates a randomlydistributed initial population 𝑃 (𝐶 = 0) of 𝑆𝑁 solutions (foodsource positions), where 𝑆𝑁 denotes the size of population.Each solution 𝑥𝑖 (𝑖 = 1, 2, ..., 𝑆𝑁 ) is a 𝐷-dimensional vector.Here, D is the number of optimization parameters. Afterinitialization, the population of the positions (solutions) issubjected to repeated cycles, 𝐶 = 1, 2, ...,𝑀𝐶𝑁 , of thesearch processes of the employed bees, the onlooker beesand scout bees. An employed bee produces a modification onthe position (solution) in its memory depending on the localinformation (visual information) and tests the nectar amount

85

(fitness value) of the new source (new solution). Provided thatthe nectar amount of the new one is higher than that of theprevious one, the bee memorizes the new position and forgetsthe old one. Otherwise it keeps the position of the previous onein its memory. After all employed bees complete the searchprocess, they share the nectar information of the food sourcesand their position information with the onlooker bees on thedance area. An onlooker bee evaluates the nectar informationtaken from all employed bees and chooses a food source witha probability related to its nectar amount. As in the case of theemployed bee, it produces a modification on the position in thememory and checks the nectar amount of the candidate source.Providing that its nectar is higher than that of the previous one,the bee memorizes the new position and forgets the old one.

An artificial onlooker bee chooses a food source dependingon the probability value associated with that food source, 𝑝𝑖,calculated by the following expression (4):

𝑝𝑖 =𝑓𝑖𝑡𝑖

𝑆𝑁∑𝑛=1

𝑓𝑖𝑡𝑛

(4)

where 𝑓𝑖𝑡𝑖 is the fitness value of the solution 𝑖 which isproportional to the nectar amount of the food source in theposition 𝑖 and 𝑆𝑁 is the number of food sources which isequal to the number of employed bees.

In order to produce a candidate food position from the oldone in memory, the ABC uses the following expression (5):

𝑣𝑖𝑗 = 𝑥𝑖𝑗 + 𝜙𝑖𝑗(𝑥𝑖𝑗 − 𝑥𝑘𝑗) (5)

where 𝑘 ∈ {1, 2,..., 𝑆𝑁} and 𝑗 ∈ {1, 2,..., 𝐷} are randomlychosen indexes. Although 𝑘 is determined randomly, it has tobe different from 𝑖. 𝜙𝑖,𝑗 is a random number between [-1, 1]. Itcontrols the production of neighbor food sources around 𝑥𝑖,𝑗

and represents the comparison of two food positions visuallyby a bee. As can be seen from (5), as the difference betweenthe parameters of the 𝑥𝑖,𝑗 and 𝑥𝑘,𝑗 decreases, the perturbationon the position 𝑥𝑖,𝑗 gets a decrease, too. Thus, as the searchapproaches to the optimum solution in the search space, thestep length is adaptively reduced.

After each candidate source position 𝑣𝑖,𝑗 is produced andthen evaluated by the artificial bee, its performance is com-pared with that of its old one. If the new food source has anequal or better nectar than the old source, it is replaced withthe old one in the memory. Otherwise, the old one is retainedin the memory. In other words, a greedy selection mechanismis employed as the selection operation between the old andthe candidate one.

The food source of which the nectar is abandoned isreplaced with a new food source by the scouts. In ABC, this issimulated by producing a position randomly and replacing itwith the abandoned one. Providing that a position can not beimproved further through a predetermined number of cycles,then that food source is assumed to be abandoned. The valueof predetermined number of cycles is an important controlparameter of the ABC algorithm, which is called “limit” for

abandonment. Assume that the abandoned source is 𝑥𝑖 and𝑗 ∈ {1, 2,..., 𝐷} , then the scout discovers a new food sourceto be replaced with 𝑥𝑖. This operation is defined as in (6)

𝑥𝑗𝑖 = 𝑥𝑗

min + rand(0, 1)(𝑥𝑗max − 𝑥𝑗

min) (6)

B. Hybrid Artificial Bee Colony Algorithm

The ABC algorithm has a strong ability to find globaloptimistic result and the LM algorithm [21] has a strong abilityto find local optimistic result. Combining the ABC with theLM, a new hybrid algorithm (ABC-LM) is proposed in thispaper. The main idea of this hybrid algorithm is that the ABCis used at the beginning stage of searching for the optimum.Then, the training process is continued with the LM algorithm.The LM algorithm interpolates between the Newton methodand gradient descent method where it approximates the errorof the network with a second order expression.

The flow-diagram of the ABC-LM model is shown in Fig.3. In the hybrid ABC-LM, ABC works as in [18]. In the firststage ABC algorithm finishes its training, then, LM algorithmstarts training with the weights of ABC algorithm and then,LM trains the network for 100 epochs more.

Set the network with initial weights

Train the network with ABC algorithm

ABC trainingfinished?

Store the best weight set

Train the network with LM algorithm

LM trainingfinished?

Store the network

Y

Y

N

N

Fig. 3. The flow-diagram of ABC-LM Hybrid Algorithm.

IV. EXPERIMENTS AND RESULTS

The performance of ABC-LM algorithm on training neuralnetworks is tested on Xor, Decoder-Encoder and 3-Bit Parityproblems. The parameter ranges, dimension of the problems,the network structures and the results of ABC and LMalgorithms are taken from reference [18].

The three layer feedforward neural networks are used foreach problem, i.e. one hidden layer and input and output

86

Input1 Input2 Output0 0 00 1 11 0 11 1 0

TABLE IBINARY XOR TABLE.

Input1 Input2 Input3 Output0 0 0 00 0 1 10 1 0 10 1 1 01 0 0 11 0 1 01 1 0 01 1 1 1

TABLE II3-BIT PARITY TABLE.

layers. The number of neurons in the hidden layer is formedwith six neurons. In the network structures, bias nodes arealso applied and sigmoid function is placed as the activatingfunction of the nodes. Experiments of hybrid model arerepeated 30 times for each case by starting with populationsof ABC algorithm.

A. The Exclusive-OR Problem

The first test problem is the exclusive-OR (XOR) Booleanfunction which is a difficult classification problem mappingtwo binary inputs to a single binary output as (0 0;0 1;10;1 1)→(0;1;1;0). In the simulations, we used a 2-2-1 feed-forward neural network with six connection weights, no bi-asses (having six parameters, XOR6) and a 2-2-1 feed-forwardneural network with six connection weights and three biases(having 9 parameters, XOR9) and a 2-3-1 feed-forward neuralnetwork having nine connection weights and four biases totallythirteen parameters (XOR13). For XOR6, XOR9 and XOR13problems, the parameter ranges [-100,100], [-10,10] and [-10,10] are used, respectively. The two inputs to an XORequation will produce an output according the values in TableI.

B. 3-Bit Parity Problem

The second test problem is the three bit parity problem.The problem is taking the modulus 2 of summation of threeinputs. In other words, if the number of binary inputs is odd,the output is 1, otherwise it is 0. (0 0 0;0 0 1;0 1 0;0 1 1;1 0 0; 1 0 1; 1 1 0; 1 1 1)→(0;1;1;0;1;0;0;1) We use a 3-3-1 feed-forward neural network structure for the 3-Bit Parityproblem. It has twelve connection weights and four biasses,totally sixteen parameters. The parameter range was [-10,10]for this problem. The three inputs to a 3-Bit Parity equationwill produce an output according the values in Table II.

Inp1 Inp2 Inp3 Inp4 Out1 Out2 Out3 Out40 0 0 1 0 0 0 10 0 1 0 0 0 1 00 1 0 0 0 1 0 01 0 0 0 1 0 0 0

TABLE III4-BIT ENCODER-DECODER TABLE.

C. 4-Bit Encoder-Decoder Problem

The third problem is 4-bit encoder/decoder problem. Thenetwork is presented with 4 distinct input patterns, each havingonly one bit turned on. The output is a duplication of theinputs: (0 0 0 1;0 0 1 0;0 1 0 0;1 0 0 0)→(0 0 0 1;0 0 10;0 1 0 0;1 0 0 0) as shown in Table III. This is quite closeto real world pattern classification tasks, where small changesin the input pattern cause small changes in the output pattern[22]. A 4-2-4 feed-forward neural network structure is usedfor this problem and it has totally 22 parameters includingeighteen connection weights and six biases. For this problem,the parameter range is [-10,10].

D. Results

Statistical results of the algorithms for XOR6, XOR9,XOR13, 3-Bit Parity and 4-Bit Encoder-Decoder problems aregiven in Table IV. From the Table IV, it is clear that the hybridABC-LM algorithm obtained the best results compared tothe artificial bee colony and Levenberq-Marquardt algorithms.This result was expected since the hybrid model trains thenetwork with the best weight set of ABC algorithm. Whenexamining the results from [18], the results of ABC is betterthan results of LM. Therefore, the results of ABC-LM wouldbe at least as good as the results of ABC algorithm. Moreover,it is seen that in each problem the gain of hybrid model isgood in terms of error value. Only in problem XOR6 the gainis about 10 times, in other problems it is at least 1000 for each,i.e the obtained mean MSE values of hybrid model for eachproblem are smaller than those of ABC and LM algorithms.The ABC-LM is also very robust since the standard deviationof hybrid model is very low.

V. CONCLUSION

In this work, a hybrid algorithm based on artificial beecolony algorithm, which is simple and robust optimizationalgorithm, is used to train feed-forward artificial neural net-works on the XOR, 3- Bit Parity and 4-Bit Encoder-Decoderbenchmark problems. ABC algorithm is hybridized with theLM algorithm, firstly ABC trains the network and LM contin-ues training by taking the best weight set of ABC algorithmand tries to minimize the training error. The results of theexperiments show that the hybrid ABC-LM algorithm hasbetter performance than the performance of the algorithmswhen used themselves. As a future work, it is planned to studythe hybrid ABC-LM model on training neural networks onhigh dimensional classification benchmark problems.

87

Problem ABC [18] ABC-LM LM [18]XOR6 MSE 0.007051 0.000752 0.110700

sd 0.00223 0.000980 0.063700XOR9 MSE 0.006956 2.1246E-09 0.049100

sd 0.002402 1.9579E-10 0.064600XOR13 MSE 0.006079 2.6111E-09 0.007800

sd 0.003182 1.2586E-09 0.0223003-Bit Par. MSE 0.006679 6.3156E-07 0.020900

sd 0.002820 3.3189E-06 0.043000Enc. Dec. MSE 0.008191 1.3007E-06 0.024300

sd 0.001864 8.8443E-07 0.042400

TABLE IVMEAN AND STANDARD DEVIATION OF MSE FOR ALGORITHMS AND

PROBLEMS.

ACKNOWLEDGMENT

This project is supported as Graduate Research Projectwith Project ID:FBD-09-1004 by Scientific Research ProjectFoundation of Erciyes University.

REFERENCES

[1] J. Dayhoff, Neural-Network Architectures: An Introduction. New York:Van Nostrand Reinhold, 1990.

[2] M. K., M. C., and R. S., Elements of Artificial Neural Networks.Cambridge, MA: MIT Press, 1997.

[3] S. Haykin, Neural Networks a Comprehensive Foundation. PrenticeHall, New Jersey, 1999.

[4] X. Yao, “Evolving artificial neural networks,” in Proceeedings of theIEEE, vol. 87(9), 1999, pp. 1423–1447.

[5] D. Rumelhart, G. Hinton, and R. Williams, “Learning representationsby backpropagation errors,” Nature, vol. 323, pp. 533–536, 1986.

[6] C. Blum and K. Socha, “Training feed-forward neural networks withant colony optimization: An application to pattern classification,” 2005,pp. 233–238.

[7] R. Sexton, B. Alidaee, R. Dorsey, and J. Johnson, “Global optimizationfor artificial neural networks: a tabu search application,” EuropeanJournal of Operational Research, vol. 106, pp. 570–584, 1998.

[8] R. Sexton, R. Dorsey, and J. Johnson, “Optimization of neural networks:A comparative analysis of the genetic algorithm and simulated anneal-ing,” European Journal of Operational Research, vol. 114, pp. 589–601,1999.

[9] T. Back and H. P. Schwefel, “An overview of evolutionary algorithmsfor parameter optimization,” Evolutionary Computation, vol. 1, no. 1,pp. 1–23, 1993.

[10] B. Verma and R. Ghosh, “A novel evolutionary neural learning algo-rithm, evolutionary computation,” in Proceedings of CEC’02, May 12-172002, pp. 1884–1889.

[11] D. Karaboga and C. Ozturk, “Neural networks training by artificial beecolony algorithm on pattern classification,” Neural Nerwork World, vol.19(3), pp. 279–292, 2009.

[12] E. Alba and J. Chicano, Training Neural Networks with GA HybridAlgorithms, ser. Proc. of Gecco, LNCS. Springer-Verlag, pp. 852–863.

[13] N. Treadgold and T. Gedeon, “Simulated annealing and weight decay inadaptive learning: the sarprop algorithm,” IEEE Transactions on NeuralNetworks, vol. 9, pp. 662–668, 1998.

[14] T. Ludermir, A. Yamazaki, and C. Zanchetin, “An optimization method-ology for neural network weights and architectures,” IEEE Transactionson Neural Networks, vol. 17(5), pp. 1452–1460, 2006.

[15] M. Carvalho and T. Ludermir, Hybrid Training of Feed-Forward NeuralNetworks with Particle Swarm Optimization, ser. LNCS. Springer-Verlag, 2007, vol. 4233, pp. 1061–1070.

[16] L. Wang, Y. Zeng, C. Gui, and H. Wang, “Application of artificial neuralnetwork supported by bp and particle swarm optimization algorithm forevaluating the criticality class of spare parts,” in Third InternationalConference on Natural Computation (ICNC 2007), Haikou, China,August 24-27 2007.

[17] J. Zhang, J. Zhang, T. Lok, and M. Lyu, “A hybrid particle swarmoptimization backpropagation algorithm for feedforward neural networktraining,” Applied Mathematics and Computation, vol. 185, pp. 1026–1037, 2007.

[18] D. Karaboga, B. Akay, and C. Ozturk, Modeling Decisions for ArtificialIntelligence, ser. LNCS. Springer-Verlag, 2007, vol. 4617/2007, ch.Artificial Bee Colony (ABC) Optimization Algorithm for Training Feed-Forward Neural Networks, pp. 318–329.

[19] D. Karaboga, “An idea based on honey bee swarm for numericaloptimization,” Erciyes University, Engineering Faculty, Computer En-gineering Department, Tech. Rep. TR06, 2005.

[20] D. Karaboga and C. Ozturk, “A novel clustering approach: Artificialbee colony (abc) algorithm,” Applied Soft Computing, vol. 11, no. 1,pp. 652–657, 2011.

[21] M. Hagan and M. Menhaj, “Training feedforward networks with themarquardt algorithm,” in Proceedings of IEEE Transactions on NeuralNetworks, vol. 6, 1994, p. 989.

[22] S. Fahlman, “An empirical study of learning speed in back-propagationnetworks,” Carnegie Mellon University, Pittsburgh, PA, Tech. Rep.CMU-CS-88-162, 1988.

88

Documents

[IEEE 2011 IEEE Congress on Evolutionary Computation (CEC) - New Orleans, LA, USA (2011.06.5-2011.06.8)] 2011 IEEE Congress of Evolutionary Computation (CEC) - Hybrid Artificial Bee