AN IMPROVED BAT ALGORITHM WITH ARTIFICIAL NEURAL …eprints.uthm.edu.my/id/eprint/9127/1/Syed_Muhammad... · complex searching problems with ease. Nowadays, nature inspired swarm

AN IMPROVED BAT ALGORITHM WITH

ARTIFICIAL NEURAL NETWORKS FOR

CLASSIFICATION PROBLEMS

SYED MUHAMMAD ZUBAIR REHMAN GILLANI

UNIVERSITI TUN HUSSEIN ONN MALAYSIA

AN IMPROVED BAT ALGORITHM WITH ARTIFICIAL NEURAL NETWORKS

FOR CLASSIFICATION PROBLEMS

SYED MUHAMMAD ZUBAIR REHMAN GILLANI

A thesis submitted in

fulfillment of the requirement for the award of the

Doctor of Philosophy of Information Technology

Faculty of Computer Science and Information Technology

Universiti Tun Hussein Onn Malaysia

MAY 2016

iii

With Love and Trillion Thanks,

To my Mother and Father who never stopped encouraging me to study further and

my success is their only dream and joy

DEDICATION

iv

ACKNOWLEDGEMENT

I am greatly indebted to all those people who have paved my path for me and helped

me rise up to claim my contributions to the field of Science.

My first and foremost thanks goes to One and Only Allah هلالج لج and his last

Messenger Mohammad ملسو هيلع هللا ىلص. This Project would have been left out as a mere dream, if

the spiritual help of my Creator and His Messenger was not there for me. No thanks

are enough to repay my Creator but there is one request, just stay with me forever and

never leave me alone.

Furthermore, I would like to extend my heartfelt thanks to the prestigious

Universiti Tun Hussein Onn Malaysia (UTHM) for giving me an opportunity to study

here and accomplish my dreams of becoming a researcher. Also, I am very grateful to

ORICC of UTHM for supporting this research under the Fundamental Research Grants

Scheme (FRGS) Vote No. 1236.

In particular, I would like to express my sincere gratitude to my supervisor,

Associate Prof. Dr. Nazri Mohd. Nawi for his continuous support, technical guidance

and assistance in finishing this project.

I would also like to extend my thanks to Prof. Imran Ghazali and Dr. Musli

Nizam Yahaya for motivating me and supporting me to continue my research journey.

I am also thankful to my parents and my siblings for believing in me and

supporting me in all my endeavours.

In the end, I am thankful to all my friends who have encouraged me and helped

me in my research.

v

ABSTRACT

Metaheuristic search algorithms have been used for quite a while to optimally solve

complex searching problems with ease. Nowadays, nature inspired swarm intelligent

algorithms have become quite popular due to their propensity for finding optimal

solutions with agility. Moreover several algorithms belonging to the stochastic and

deterministic classes are available (i.e. ABC, HS, CS, WS, BPNN, LM, and ERNN

etc.). Recently, a new metaheuristic search Bat algorithm has become quite popular

due its tendency towards convergence to optimal points in the search trajectory by

using echo-location behavior of bats as its random walk. However, Bat suffers from

large step lengths that sometimes make it to converge to sub-optimal solution.

Therefore, in order to improve the exploration and exploitation behavior of bats, this

research proposed an improved Bat with Gaussian Distribution (BAGD) algorithm that

takes small step lengths and ensures convergence to global optima. Then, the proposed

BAGD algorithm is further hybridized with Simulated Annealing (SA) and Genetic

Algorithm (GA) to perform two stage optimization in which the former algorithm finds

the optimal solution and the latter algorithm starts from where the first one is

converged. This multi-stage optimization ensures that optimal solution is always

reached. The proposed BAGD, SABa, and GBa are tested on several benchmark

functions and improvements in convergence to global optima were detected. Finally

in this research, the proposed BAGD, SABa, and GBa are used to enhance the

convergence properties of BPNN, LM, and ERNN with proper estimation of the initial

weights. The proposed Bat variants with ANN such as; Bat-BP, BALM, BAGD-LM,

BAGD-RNN, GBa-LM, GBa-RNN, SABa-RNN, and SABa-LM are evaluated and

compared with ABC-BP, and ABC-LM algorithms on seven benchmark datasets.

From the simulation results, it can be realized that the proposed Bat algorithms with

ANN outperforms the other algorithms in terms of CPU time, Mean Squared Error

(MSE), and accuracy during convergence to global minima.

vi

ABSTRAK

Algoritma carian Metaheuristic telah mula dikenali oleh penyelidik dan digunakan

secara optimum untuk menyelesaikan masalah pencarian yang kompleks dengan lebih

mudah. Pada masa kini, algoritma pintar yang diilhamkan dari sifat semulajadi swarm

telah menjadi sangat popular kerana kecenderungan mereka untuk mencari

penyelesaian optimum dengan pantas. Lebih-lebih lagi beberapa algoritma yang

tergolong di dalam kelas stochastic dan deterministic senang diperolehi (seperti ABC,

HS, CS, WS, BPNN, LM, ERNN dan lain-lain). Baru-baru ini, satu algoritma

metaheuristic iaitu algoritma carian kelawar telah menjadi agak popular kerana

kecenderungan algoritma tersebut ke arah penumpuan yang lebih tepat kepada

optimum dalam trajektori carian dengan menggunakan tingkah laku echo-lokasi

kelawar sebagai perjalanan rawak itu. Walau bagaimanapun, carian kelawar ini

mempunyai kelemahan iaitu mengambil langkah yang panjang yang mana kadang-

kadang ia menyebabkan penyelesaian buntu dan terbantut di penyelesaian sub-

optimum. Oleh itu, untuk memperbaiki tingkah laku penerokaan dan eksploitasi

kelawar, kajian ini mencadangkan satu algoritma yang lebih baik terhadap algorima

kelawar melalui Pengagihan algoritma Gaussian (BAGD) yang akan memndekan

langkah dan memastikan penumpuan kepada optima global. Kemudian, algoritma

BAGD yang dicadangkan selanjutnya digabungkan dengan Simulated Annealing (SA)

dan algoritma genetik (GA) untuk melaksanakan dua fasa pengoptimuman di mana

algoritma yang pertama akan mencari penyelesaian optimum dan algoritma yang

kedua bermula dari di mana algoritma yang pertama selesai. Kaedah dua fasa

pengoptimuman ini akan memastikan bahawa penyelesaian optimum sentiasa dapat

dicapai. Algoritma yang dicadangkan seperti BAGD, SABa, dan GBa telah diuji pada

beberapa fungsi penanda aras dan peningkatan kepada penumpuan optima global telah

dapat ditunjukan. Akhirnya dalam kajian ini, algoritma yang dicadangkan seperti

BAGD, SABa, dan GBa digunakan untuk meningkatkan sifat-sifat penumpuan BPNN,

vii

LM, dan ERNN dengan memberikan anggaran pemberat yang lebih tepat. Algoritma

kelawar yang dicadangkan dengan variasi bersama ANN seperti; Bat-BP, BALM,

BAGD-LM, BAGD-RNN, GBa-LM, GBa-RNN, SABa-RNN, dan SABa-LM dinilai

dan dibandingkan pencapaian dengan ABC-BP, dan algoritma ABC-LM terhadap

tujuh set data penanda aras. Dari hasil simulasi, keputusan menunjukan bahawa

algoritma kelawar yang dicadangkan dengan ANN menunjukan pencapaian yang lebih

baik dari segi masa CPU, Mean Squared Error (MSE), dan ketepatan semasa

penumpuan kepada minima global.

viii

TABLE OF CONTENTS

TITLE i

DECLARATION ii

DEDICATION iii

ACKNOWLEDGEMENT iv

ABSTRACT v

ABSTRAK vi

LIST OF TABLES xi

LIST OF FIGURES xiii

LIST OF SYMBOLS AND ABBREVIATIONS xvi

LIST OF APPENDICES xx

LIST OF PUBLICATIONS xxi

LIST OF AWARDS xxiii

CHAPTER 1 INTRODUCTION 1

1.1 Background of the Research 1

1.2 Problem Statement 3

1.3 Aims of the Research 4

1.4 Objectives of the Research 4

1.5 Scope of the Research 5

1.6 Significance of the Research 5

1.7 Outline of the Thesis 6

CHAPTER 2 LITERATURE REVIEW 7

2.1 Introduction 7

2.2 Numerical Optimization 7

2.3 Deterministic Algorithms 9

2.3.1 Back-propagation Neural Network 9

2.3.2 Recurrent Neural Network (RNN) 15

ix

2.3.3 Levenberg-Marquardt (LM) Algorithm 18

2.4 Swarm Intelligent Metaheuristics 21

2.5 Bat Algorithm 26

2.5.1 Improvements on Bat Algorithm 29

2.5.1.1 Improving Exploration in Bat 29

2.5.1.2 Improving Exploitation in Bat 32

2.6 Gaussian Distribution Random Walk 33

2.7 Research Gap Analysis on Bat algorithm 35

2.8 Chapter Summary 39

CHAPTER 3 RESEARCH METHODOLOGY 40

3.1 Introduction 40

3.2 The Proposed BAGD Algorithm 42

3.3 The Proposed GBa Algorithm 44

3.4 The Proposed SABa Algorithm 47

3.5 The Proposed Improved Artificial Neural Networks 49

3.6 The Proposed BAGD-LM Algorithm 50

3.7 The Proposed BAGD-RNN Algorithm 55

3.8 The Proposed GBa-LM Algorithm 60

3.9 The Proposed GBa-RNN Algorithm 64

3.10 The SABa-LM Algorithm 66

3.11 The Proposed SABa-RNN Algorithm 70

3.12 The Proposed Bat-BP Algorithm 73

3.13 The Proposed BALM Algorithm 76

3.14 The Proposed BARNN Algorithm 79

3.15 Data Collection 82

3.15.1 Benchmark Functions 82

3.15.2 Classification Datasets 83

3.16 Data Pre-Processing 83

3.17 Data Partitioning 84

3.18 Improved Artificial Neural Network Topology 85

3.19 Training the Network 86

3.20 Performance Comparison and Model Selection 86

3.21 Performance Comparison 87

x

3.22 AUROC Analysis 88

3.23 Chapter Summary 88

CHAPTER 4 SIMULATION ON BENCHMARK FUNCTIONS 90

4.1 Introduction 90

4.2 Preliminaries for Benchmark Functions 91

4.3 Ackley Benchmark Function 92

4.4 Bohachevsky Benchmark Function 93

4.5 Easom Benchmark Function 95

4.6 Griewank Benchmark Function 96

4.7 Rastrigin Benchmark Function 97

4.8 Rosenbrock Benchmark Function 98

4.9 Schaffer Benchmark Function 100

4.10 Schwefel 1.2 Benchmark Function 101

4.11 Sphere Benchmark Function 102

4.12 Step Benchmark Function 103

4.13 Conclusions 104

CHAPTER 5 IANN FOR CLASSIFICATION PROBLEMS 105

5.1 Introduction 105

5.2 Preliminaries for Classification Problems 106

5.3 Breast Cancer Dataset 107

5.4 Australian Credit Card Approval Dataset 121

5.5 Thyroid Dataset 136

5.6 Pima Indian Diabetes 151

5.7 Glass Identification Dataset 166

5.8 Iris Dataset 181

5.9 Seven Bit Parity Dataset 195

5.10 Conclusions 210

CHAPTER 6 CONCLUSIONS AND FUTURE WORKS 212

6.1 Introduction 212

6.2 Summary of Research Findings 213

6.3 Contributions of the Research 214

6.4 Future Works 216

xi

LIST OF TABLES

3.1 Mathematical Formulae of Benchmark Functions 82

3.2 Properties of Benchmark Functions 83

3.3 Classification Datasets from UCIMLR 83

3.4 Data partitioning of the datasets 85

3.5 Network Topology for all datasets 86

4.1 Average Optimization Results for Ackley Function 92

4.2 Average Optimization for Bohachevsky Function 94

4.3 Average Optimization Results for Easom Function 95

4.4 Average Optimization Results for Griewank Function 96

4.5 Average Optimization Results for Rastrigin Function 97

4.6 Average Optimization Results for Rosenbrock Function 99

4.7 Average Optimization Results for Schaffer Function 100

4.8 Average Optimization for Schwefel 1.2 Function 101

4.9 Average Optimization Results for Sphere Function 102

4.10 Average Optimization Results for Step Function 103

5.1 Perform. of Proposed Algos. on Breast Cancer (60:40) 107



5.4 Performance of Proposed Algos. on ACCA (60:40) 122



5.7 Performance of Proposed Algos. on Thyroid (60:40) 137



5.10 Performance of Proposed Algos. on Diabetes (60:40) 152



xii

5.13 Performance of Proposed Algos. on Glass (60:40) 167



5.16 Performance of Proposed Algos. on Iris (60:40) 182



5.19 Perform. of Proposed Algos. on 7 Bit Parity (60:40) 196

5.20 Perform. of Proposed Algos. on 7 Bit Parity (70-30) 201

5.21 Perform. of Proposed Algos. on 7 Bit Parity (80:20) 206

xiii

LIST OF FIGURES

2.1 Simple Back Propagation Neural Network Architecture 10

2.2 Schematic error func.for a single parameter w 11

2.3 An Elman Recurrent Neural Network (Boden, 2001) 17

2.4 Original Bat Algorithm (Yang, 2010a) 28

2.5 Gaussian distribution curves for different SD values 34

2.6 Research Gap Analysis on Bat algorithm 38

3.1 Research Methodology 41

3.2 The Proposed BAGD algorithm 44

3.3 Block Diagram of the Proposed GBa algorithm 45

3.4 The Proposed GBa algorithm 46

3.5 Block Diagram of the Proposed SABa algorithm 47

3.6 The Proposed SABa algorithm 48

3.7 Flowchart of the Proposed BAGD-LM algorithm 51

3.8 The Standard Levenberg-Marquardt algorithm 53

3.9 Pseudo code of the BAGD-LM algorithm 55

3.10 Flowchart of the Proposed BAGD-RNN algorithm 56

3.11 Pseudo code of the BAGD-RNN algorithm 60

3.12 Flowchart of the Proposed GBa-LM algorithm 61

3.13 Pseudo code of the GBa-LM algorithm 63

3.14 Flowchart of the Proposed GBa-RNN algorithm 64

3.15 Pseudo code of the GBa-RNN algorithm 66

3.16 Flowchart of the Proposed SABa-LM algorithm 67

3.17 Pseudo code of the SABa-LM algorithm 69

3.18 Flowchart of the Proposed SABa-RNN algorithm 70

3.19 Pseudo code of the SABa-RNN algorithm 72

3.20 Flowchart of the Proposed Bat-BP algorithm 73

xiv

3.21 Proposed Bat-BP algorithm 75

3.22 Proposed BALM algorithm 77

3.23 Pseudo code of the BALM algorithm 78

3.24 Proposed BARNN algorithm 80

3.25 Pseudo code of the BARNN algorithm 81

4.1 Performance of the proposed algorithms on Ackley 93

4.2 Performance of the proposed algos on Bohachevsky 94

4.3 Performance of the proposed algorithms on Easom 95

4.4 Performance of the proposed algorithms on Griewank 97

4.5 Performance of the proposed algorithms on Rastrigin 98

4.6 Performance of the proposed algo on Rosenbrock 99

4.7 Performance of the proposed algorithms on Schaffer 100

4.8 Performance of the proposed algos on Schwefel 1.2 101

4.9 Performance of the proposed algorithms on Sphere 103

5.1 Performance of the Algos. on Breast Cancer (60:40) 109

5.2 Performance Eval. of Algos. on Breast Cancer (60:40) 112





5.7 Performance of the Algorithms on ACCA (60:40) 124

5.8 Performance Eval. of Algorithms on ACCA (60:40) 126


5.10 Performance Eval. of Algorithms on ACCA (70:30) 131


5.12 Performance Eval. of Algos. on ACCA (80:20) 136

5.13 Performance of the Algorithms on Thyroid (60:40) 139

5.14 Performance Eval. of Algos. on Thyroid (60:40) 141





5.19 Performance of the Algorithms on Diabetes (60:40) 154

xv

5.20 Performance Eval. of Algos. on Diabetes (60:40) 156


5.22 Performance Eval. of Algos. on Diabetes (70:30) 161


5.24 Performance Eval. of the Algos. on Diabetes (80:20) 166

5.25 Performance of the Algorithms on Glass (60:40) 169

5.26 Performance Eval. of Algos. on Glass (60:40) 171


5.28 Performance Eval. of the Algos. on Glass (70:30) 176


5.30 Performance Eval. of Algos. on Glass (80:20) 181

5.31 Performance of the Algorithms on Iris (60:40) 183

5.32 Performance Eval. of Algos. on Iris (60:40) 186

5.33 Performance of the Algorithms on Iris (70:30) 188


5.35 Performance of the Algos. on Iris (80:20) 193


5.37 Performance of the Algos. on 7 Bit Parity (60:40) 198

5.38 Performance Eval. of Algos. on 7 Bit Parity (60:40) 200





xvi

LIST OF SYMBOLS AND ABBREVIATIONS

e - Exponent

𝜎2 - Variance

𝜎 - Standard Deviation

x - Normally distributed variable

µ - Mean

cs - Chaotic Sequence

⨂ - Hadamard Product Operator for Step-wise

Multiplication

Ti - Desired output of the 𝑖𝑡ℎ output unit

Yi - Network output of the 𝑖𝑡ℎ output unit

δk - Is the error for the output layer at kth node

δj - Is the error for the hidden layer at jth node

hj - Output of the jth hidden node

Oi - Output of the ith input node

η - Learning rate

i, j - Subscripts i, and j, corresponding to input and hidden

nodes

k - Subscript 𝑘, corresponding to output nodes

wjk - Weight on the link from hidden node j to output node

wij - Weight on the link from input node i to hidden node j

vit+1 - velocity vector

xit - position vector

α - learning parameter or acceleration constant

εn - random vector drawn from N (0, 1)

𝑥∗ - Global best

𝑥𝑛𝑒𝑤 - New value obtained

xvii

𝑥𝑜𝑙𝑑 - Stored Old values

𝑥𝑚𝑎𝑥 - Maximum of the old data range

𝑥𝑚𝑖𝑛 - Minimum of the old data range

U - The Upper normalization bound

L - The Lower normalization bound

Ti - Predicts data

Ai - Actual data

A - Loudness

f - Frequency

r - pulse rate

v - velocity

n - Total number of inputs patterns

Xi - The observed value

X̅i - Mean value of the observed value

ANN - Artificial Neural Network

ALM - Adaptive Learning Rate and Momentum

AF - Activation Function

ACO - Ant Colony Optimization

ABC - Artificial Bee Colony

ABC-BP - Artificial Bee Colony with Back Propagation

ABC-LM - Artificial Bee Colony with Levenberg-Marquardt

ABCNN - Artificial Bee Colony Neural Network

APSO - Accelerated Particle Swarm Optimization

AUROC - Area under the Receiver Operating Characteristic

BADE - Bat with Differential Evolution

BAGD - Bat with Gaussian Distribution

BAGD-LM - Bat with Gaussian Distribution Levenberg-

Marquardt

BAGD-RNN - Bat with Gaussian Distribution Recurrent Neural

Network

BALM - Bat with Levenberg-Marquardt

BARNN - Bat with Recurrent Neural Network

Bat-BP - Bat with Back Propagation

xviii

BBA - Binary Bat Algorithm

BP/BPNN - Back Propagation Neural Network

CS - Cuckoo Search

CBSO - Chaotic Bat Swarm Optimization

CSBP - Cuckoo Search with Back Propagation

CLT - Central Limit Theorem

DE - Differential Evolution

DE-BP - Differential Evolution with Back Propagation

DLBA - Differential Levy Flight Bat Algorithm

ERN/ERNN - Elman Recurrent Neural Network

ERNPSO - Elman Recurrent Network with Particle Swarm

Optimization

FFNN - Feed Forward Neural Network

FLANN - Functional Link Artificial Neural Networks

GA - Genetic Algorithm

GBa - Genetic Bat algorithm

GBa-BP - Genetic Bat with back propagation

GBa-LM - Genetic Bat with Levenberg-Marquardt

GBa-RNN - Genetic Bat with Recurrent Neural Network

GDAM - Gradient Descent with Adaptive Momentum

GLM - Genetic Levenberg-Marquardt

HBA - Hybrid Bat Algorithm

HS - Harmony Search

HSABA - Harmony Search with Adaptive Bat Algorithm

HSBA - Harmony Search with Bat Algorithm

IANN - Improved Artificial Neural Networks

IBA - Improved Bat Algorithm

LM - Levenberg-Marquardt

MBDE - Modified Bat with Differential Evolution

MSE - Mean Squared Error

PSO - Particle Swarm Optimization

PSO-FLANN - Particle Swarm Optimization with Functional Link

Artificial Neural Networks

xix

PSOGSA - Particle Swarm Optimization with Gravitational

Search Algorithm

RNN - Recurrent Neural Network

ROC - Receiver Operating Characteristics

SA - Simulated Annealing

SABa - Simulated Annealing Bat Algorithm

SABa-BP - Simulated Annealing Bat with Back Propagation

SABa-LM - Simulated Annealing Bat with Levenberg-Marquardt

SABa-RNN - Simulated Annealing Bat with Recurrent Neural

Network

SAGBA - Simulated Annealing Gaussian Bat Algorithm

WSA - Wolf Search Algorithm

xx

LIST OF APPENDICES

APPENDIX TITLE PAGE

A.1: Gantt Chart of Research Activities 232

xxi

LIST OF PUBLICATIONS

Journals:

1. N. M. Nawi, M. Z. Rehman, Abdullah Khan, Haruna Chiroma, Tutut

Herawan (2015). An Improved Bat with Gaussian Distribution

Algorithm. Journal of Computational and Theoretical Nano Science

(CTN). ISI IF: 1.343.

2. N. M. Nawi, M. Z. Rehman, Abdullah Khan, Arslan Kiyani, Haruna

Chiroma, Tutut Herawan (2015). Hybrid Bat and Levenberg-Marquardt

Algorithms for Artificial Neural Networks Learning. Journal of

Information Science and Engineering. ISI IF: 0.414.

3. N. M. Nawi, M. Z. Rehman, Abdullah Khan, Haruna Chiroma, Tutut

Herawan (2015). Weight Optimization in Recurrent Neural Networks

with Hybrid Metaheuristic Cuckoo Search Techniques for Data

Classification. Mathematical Problems in Engineering (MPE). ISI IF:

0.762.

4. N. M. Nawi, M. Z. Rehman, M. I .Ghazali, M. N. Yahya, Abdullah

Khan (2014). Hybrid Bat-BP: A New Intelligent tool for Diagnosing

Noise-Induced Hearing Loss (NIHL) in Malaysian Industrial Workers, J.

Applied Mechanics and Materials, Trans Tech Publications, Switzerland,

vol. 465-466, pp. 652--656, 2014.

Conference Proceedings:

1. N. M. Nawi, M. Z. Rehman, Abdullah khan, Nurfarain Hafifie, Insaf

Ali Siming (2015). Bat-BP: A New Bat Based Back-Propagation

Algorithm for Efficient Data Classification. International Integrated

Engineering Summit 2014.

xxii

2. N. M. Nawi, M. Z. Rehman, Abdullah Khan (2014). WS-BP: A New

Wolf Search based Back-propagation Algorithm. AIP Proceedings:

International Conference on Mathematics, Engineering & Industrial

Applications 2014 (ICoMEIA 2014) on 28th ~ 30th May, ICoMEIA

2014, Penang.

3. N. M. Nawi, M. Z. Rehman, Abdullah Khan (2014). Advanced Data

Classification With Hybrid Accelerated Cuckoo Particle Swarm

Optimization Based Levenberg Marquardt Algorithm. CEET-14.

4. N. M. Nawi, M. Z. Rehman, Abdullah Khan (2014). An Accelerated

Particle Swarm Optimized Intelligent Weight Update in Back

Propagation Algorithm. CEET-14.

5. M. Z. Rehman, N. M. Nawi, Abdullah Khan (2013). Countering the

problem of oscillations in Bat-BP gradient trajectory by using

momentum. The First International Conference on Advanced Data and

Information Engineering (DaEng-2013). 16-18 Dec, Kuala Lumpur,

Malaysia.

6. M. Z. Rehman, N. M. Nawi, Abdullah Khan (2013). The Effect of Bat

Population in Bat-BP Algorithm. 8th International Conference on

Robotics, Vision, Signal Processing & Power Applications (ROVISP

2013) Penang, Malaysia 10-12 NOVEMBER 2013.

7. M. Z. Rehman, N. M. Nawi, Abdullah Khan (2013). A New Bat Based

Back-Propagation (BAT-BP) Algorithm. International Conference on

Systems Science 2013 (ICSS 2013)

xxiii

LIST OF AWARDS

(i)

First Best paper Award – International Conference on Man Machine

Systems (ICoMMS) [2015]

An Accelerated Particle Swarm Optimized Intelligent Weight Update in Back

Propagation Algorithm

(ii) Third Best paper Award – Malaysian Universities Technical Conference

on Engineering and Technology (MUCET) [2015]

Enhancing The Cuckoo Search With Levy Flight Through Population

Estimation

(iii) Bronze Medal - Research and Innovation Festival 2014, Universiti Tun

Hussein Onn Malaysia [2014]

An Efficient Hybrid Accelerated Cuckoo Particle Swarm Optimization

(HACPSO) Learning Algorithm

CHAPTER 1

INTRODUCTION

1.1 Background of the Research

Optimization is a daily part of human lives involving many variables engineered

together in a tidy and fashionable way. As far as we go back in the history, optimization

is applied everywhere from a needle design to rocket science. Optimization is required

where the provision of robust and reliable solutions for the masses is needed within

limited resources, budget, time, and quality (Yang, 2008; Yang, 2010).

Usually the process of optimization involves finding an optimal solution out of

all the potential ones (Wang and Guo, 2013). Based on the searching styles,

optimization algorithms are classified into two categories, i.e. deterministic and

stochastic algorithms. Figuratively speaking, deterministic technique is a quite

rigorous technique when it comes to finding the optimal solution. By using its gradient

descent system, it will always generate the same optimal solution between the highest

and lowest extremes of that specific gradient. One of the most popular gradient descent

technique used is back propagation neural network (BPNN) algorithm (Rumelhart,

Hinton, and Williams, 1986). On the other hand, stochastic algorithms select random

points in a terrain and finds different optimal solutions converging to the global

minima more efficiently than the deterministic algorithms. Recently, nature inspired

metaheuristic algorithms which inherit the working principle of stochastic approach

have become popular in solving many real world non-linear problems (Beni and Wang,

1993; Blum and Roli, 2003; Yang, 2010; Yang, 2008).

2

A metaheuristic optimization method is a heuristic strategy for probing the

search space of an ultimately global optimum in a more or less intelligent way (Gilli

and Winker, 2008). A metaheuristic optimization is grounded in the belief that a

stochastic, high-quality approximation of a global optimum obtained at the best effort

will probably be more valuable than a deterministic, poor quality local minima

provided by a classical method or no solution at all (Tang et al., 2012). Incrementally,

it optimizes a problem by attempting to improve the candidate solution with respect to

a given measure of quality defined by a fitness function. As such, metaheuristic

optimization algorithms are often based on local search methods in which the solution

space is not explored systematically or exhaustively, but rather a particular heuristic is

characterized by the manner in which the exploration through the solution space is

organized.

Some current examples of metaheuristics are Particle Swarm Optimization

(PSO) which has been successfully applied in problems of antenna design (Jin and

Rahmat-Samii, 2007) and electro-magnetic (Robinson and Rahmat-Samii, 2004). Ant

Colony Optimization (ACO) algorithms are also used in many areas of optimization,

such as data mining and project scheduling (Merkle et al., 2002; Parpinelli et al.,

2002). Proposed by Karaboga and Akay (2009), Artificial bee colony (ABC) showed

good performance in numerical optimization, especially on large-scale global

optimization (Fister et al., 2012), and also in combinatorial optimization problems

(Fister Jr et al., 2012; Neri and Tirronen, 2009; Pan et al., 2011; Parpinelli and Lopes,

2011). Lately, new set of metaheuristic have been added to the family of age long

swarm intelligent algorithms. These bio-inspired algorithms include Firefly (Yang,

2013), Cuckoo (Yang and Deb, 2009), APSO (Yang et al., 2012), Wolf (Tang et al.,

2012), and Bat (Yang, 2010a). These metaheuristic optimization algorithms have

search methods both in breath and in depth that are largely based on the swarm

movement patterns of animals and insects found in nature. Their performance in

metaheuristic optimizations have proven superior to that of many classical heuristics

such as Genetic Algorithm (GA) (Goldberg, 1989) and Simulated Annealing (SA)

(Kirkpatrick et al., 1983).

Developed by Yang (2010a), Bat algorithm uses echolocation with varying

pulse rates of emission and loudness to find and converge to the optimal solution.

Initially, Bat algorithm was found beneficial but later on it was realized that whenever,

3

the bat deals with lower-dimensional optimization problems, it obtained good results

but may become problematic for higher-dimensional problems; because, it is inclined

to converge very fast initially (Jr and Yang, 2013). Also, Bat algorithm has been found

using longer step lengths using random walk which can cause it to skip optimal

solutions in the region. Therefore, to solve higher dimensional problems and to

decrease the step lengths, this research is utilizing Gaussian distribution as random

walk which provides shorter step lengths during search and helps to converge to global

minima efficiently (Wang and Guo, 2013; Zheng and Yongquan, 2012).

Although, deterministic techniques such as BPNN, Recurrent Neural Networks

(RNN) or Levenberg-Marquardt (LM) have been used extensively in many

optimization problems but these methods face slow convergence or convergence to

local minima due to poor approximation of initial weight values (Kolen and Pollack

1990; Ghosh and Chakraborty, 2012; Sarangi, et al., 2013). In-order to overcome these

downsides of weights initialization, several hybrid algorithms have recently emerged

from the amalgamation of deterministic and stochastic algorithms which are; Genetic

Levenberg-Marquardt (GLM) (Kermani et al., 2005), Artificial Bee Colony Neural

Network (ABCNN) (Karaboga, et al., 2007), Elman Recurrent Network with Particle

Swarm Optimization (ERNPSO) (Ab Aziz et al., 2009), Particle Swarm Optimization

with Gravitation Search Algorithm (PSO-GSA) (Mirjalili, et al., 2012), Differential

Evolution Back Propagation (DE-BP) (Sarangi et al., 2013), and Cuckoo Search with

Back Propagation (CSBP) (Yi, et al., 2014). Despite providing a method for

approximate initial weights, these methods are slow in convergence. Therefore, in this

research, the proposed Gaussian distribution based Bat (BAGD) algorithm is

hybridized with BPNN, RNN, and LM which avoids slow convergence and provides

high accuracy during convergence on classification datasets.

1.2 Problem Statement

From the previous studies on Bat algorithm, it was realized that whenever, the bat deals

with lower-dimensional optimization problems, it obtained good results but may

become problematic for higher-dimensional problems; because, it is inclined to

converge very fast initially (Jr and Yang, 2013). Also, Bat algorithm uses its own

echolocation random walk which takes longer steps and thus converge to less optimal

4

solutions in the trajectory. Therefore, to solve higher dimensional problems and to

decrease the step lengths, this research proposed on using Gaussian distribution which

provides shorter step lengths during search. The proposed Bat with Gaussian

distribution is further hybridized with deterministic methods such as; BPNN, ERNN,

and LM to solve their problem of local minima and slow convergence by introducing

intelligent approximation of weights.

1.3 Aims of the Research

This research aims to improve Bat algorithm’s convergence behavior during

exploration and exploitation process by introducing Gaussian distribution random

walk. This research advances by introducing Improved Artificial Neural Networks

(IANN) emerging due to the Bat with Gaussian distribution’s (BAGD) combination

with different Multi-layer Perceptron (MLP) architectures. With optimal weights

obtained from bat prey searching, the learning in IANN structures can be greatly

enhanced. Moreover, this research pursues a suitable network architecture which

retains good performance on classification datasets with less CPU overheads and

training and testing errors. The proposed IANN algorithms will try to reduce the

training and testing error in standard BPNN, ABC-BP, ABC-LM, Bat-BP, BALM,

BAGD-LM, GBa-LM, SABa-LM, BARNN, BAGD-RNN, GBa-RNN, and SABa-

RNN on benchmarked and real classification datasets.

1.4 Objectives of the Research

This study encompasses the following three objectives:

a. To propose an Improved Bat algorithm with Gaussian random walk that exploits

search space and thus by reducing large step lengths leads the Bat towards

convergence to global optima.

b. To propose Simulated Annealing with Bat algorithm (SABa) and Genetic Bat

algorithm (GBa) that improves exploration and exploitation behaviour in Bat

algorithm during convergence to global optima.

c. To propose and compare the performances of the Improved Artificial Neural

Networks (IANN) (i.e. BPNN, LM, ERNN with weights initialized from Improved

5

BAGD, GBa, and SABa) with BPNN, ABC-BP, and ABC-LM, on selected

benchmarked classification datasets in terms of Accuracy, MSE and standard

deviation (SD).

1.5 Scope of the Research

This study will focus on the use of Gaussian distribution random walk in conventional

Bat algorithm to solve the problem of large step lengths that leads it towards early

convergence and makes Bat more prone to less optimal solutions. Also, the proposed

Bat with Gaussian distribution (BAGD) algorithm will be hybridized with Artificial

Neural Networks (ANN). The performance of BAGD and its variants will be verified

on benchmark functions and classification datasets.

1.6 Significance of the Research

This research provides the following contributions in the field of swarm intelligent

metaheuristics as well as the emerging field of heuristics, i.e. Improved Artificial

Neural Networks (IANN);

a. The proposed BAGD algorithm used Gaussian distribution random walk and

solved the problem of large step length taken by the original Bat and slow

convergence on large dimensional problems was solved.

b. The proposed GBa and SABa algorithms helped Bat increased the exploration and

exploitation process through the introduction of intensive local and global search

techniques provided by Genetic and Simulated Annealing algorithms.

c. The proposed IANN algorithms such as; Bat-BP, BALM, BAGD-LM, GBa-LM,

SABa-LM, BARNN, BAGD-RNN, GBa-RNN, and SABa-RNN etc. provided

optimal weight values that helped in obtaining outstanding performance on

classification datasets.

6

1.7 Outline of the Thesis

This Thesis is subdivided into Six Chapters including the introduction and conclusion

ones. The following is the outline of each Chapter.

Besides providing an outline of the thesis, Chapter 1 contains the overview on

the background of the study, scope of the study, aims, objectives and significance of

the research undertaken.

Chapter 2 reviews the previous studies made on optimization, with detailed

overview on the use of swarm intelligent techniques are reviewed. In swarm

Intelligence, Bat algorithm’s problems and the previous improvements are highlighted

after a deep review and the need for further improvements are indicated. After detailed

discussion on Bat algorithms. This Chapter reviews the hybrid metaheuristic

algorithm’s emerging from the combination of stochastic and deterministic techniques.

Finally, Chapter 2 comes to a close while discussing the pros and cons associated with

the hybrid metaheuristics.

On the foundations of the Chapter 2, Chapter 3 presents improved Bat

algorithms such as BAGD, GBa, and SABa to improve the step length in searching as

well as converging to global optima efficiently. This Chapter also introduces the

efficient proposed IANN algorithms, i.e. Bat-BP, BALM, BAGD-LM, GBa-LM,

SABa-LM, WSLM, BARNN, BAGD-RNN, GBa-RNN, SABa-RNN, and WRNN to

reduce the training and testing error during IANN learning process. Finally, the

Chapter concludes elaborating on the data collection, data partitioning, pre-processing,

post-processing, network architecture and performance comparison of the proposed

algorithms with standard BPNN, ABC-BP, and ABC-LM algorithms.

In Chapter 4, the proposed BAGD, GBa, and SABa are tested for convergence

on the benchmark functions. Meanwhile, in Chapter 5, the proposed IANN algorithms,

such as; Bat-BP, BALM, BAGD-LM, GBa-LM, SABa-LM, BARNN, BAGD-RNN,

GBa-RNN, and SABa-RNN etc. are programmed into MATLAB and tested for their

accuracy on selected classification problems.

In Chapter 6, the research contributions are summarized and several

recommendations for applying the proposed algorithms in engineering fields are

suggested. Future works are also discussed in this Chapter.

CHAPTER 2

LITERATURE REVIEW

2.1 Introduction

This Chapter begins with explaining the mathematical optimization and its need in

improving the search direction. Then stochastic search optimization algorithms are

discussed in detail. In stochastic optimization, some famous methods such as

evolutionary Genetic Algorithm (GA), Simulated Annealing (SA) based on heat

control in metallurgy, and Harmony Search are discussed. In the same section, Swarm

intelligent metaheuristics such as; Artificial Bee Colony (ABC), Particle Swarm

Optimization (PSO), Cuckoo Search (CS), and Bat algorithms are brought into

limelight. Then further down in the sections, merits and demerits of recently

introduced techniques on Bat algorithm are taken into account and Gaussian

distribution is discussed as a way of enhancing the exploration and exploitation

capability in Bat algorithm. The transition and the need of transition from swarm

optimization to hybrid swarms are discoursed. Finally the Chapter is concluded with

details on the current and possibly new hybrids emerging inspired from the merger of

current metaheuristic architectures available with hybrid Bat algorithms.

2.2 Numerical Optimization

Numerical optimization is a process of adjusting a set of interrelated input parameters

used to extricate physical occurrence observed in the nature (presented in the form of

8

a mathematical process) to find the minimum or maximum appropriate output

quantities. Mathematically speaking optimization can be formulated as;

𝐹𝑖𝑥∈𝑅𝑛𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 (𝑥), (𝑖 = 1,2, … ,𝑀) (2.1)

Subjected to constraints;

∅𝑗(𝑥) = 0, (𝑗 = 1,2, … ,𝑀) (2.2)

𝜑𝑘(𝑥) ≤ 0, (𝑘 = 1,2, … ,𝑀) (2.3)

Where, 𝐹𝑖(𝑥), ∅𝑗(𝑥), and 𝜑𝑘(𝑥) are functions of the design vector;

𝑥 = (𝑥1, 𝑥2, … , 𝑥𝑛)𝑇 (2.4)

Here the components 𝑥𝑖 of x are called design or decision variables, and they

can be real continuous, discrete or the mixture of both. The function 𝐹𝑖(𝑥) where, 𝑖 =

1,2, … ,𝑀 is called the objective/cost function. Therefore in the case of M = 1, there is

only a single objective but as the real world problems are mostly multi-objective and

non-linear, so the objective function can be more than 1. The space spanned by the

decision variables is called the design space or search space 𝑅𝑛, while the space

formed by the objective function values is called the solution space or response space.

The equalities for ∅𝑗(𝑥), and the inequalities for 𝜑𝑘(𝑥) are called constraints (Yang,

2008, 2010a, 2010b).

After the formulation of the optimization function, the next task is to find the

best optimal solutions using the right mathematical formulae. On the basis of searching

styles, optimization algorithms are usually classified into two major categories;

a) Deterministic algorithms

b) Swarm Intelligent metaheuristics

9

2.3 Deterministic Algorithms

Metaphorically speaking, searching for optimal solution is like treasure hunting. And

if, we are given a choice to find the treasure in a hill terrain while being blindfolded.

In this case, the search will be random and efficient but in most cases this technique is

rendered useless. In another scenario, the treasure is searched on the highest peak or

between the highest and lowest extremes of the hill terrain. This situation corresponds

to the gradient ascent or descent technique. In this search all the hill terrain will be

searched rigorously and the search technique will be the same every time this

technique is repeated. Therefore, the results will always be the same (Yang, 2008).

One of the most popular gradient descent technique used is back propagation neural

network (BPNN) algorithm.

2.3.1 Back-propagation Neural Network

Back-propagation Neural Network (BPNN) is an optimization algorithm applied on

the Artificial Neural Networks (ANN) to speed up the network convergence to global

optima during training process (Rumelhart et al., 1986; Wang and Guo, 2013). BPNN

trails the elementary principles of ANN which emulates the learning skills of human

reasoning. Like ANN, BPNN comprises of an input layer, one or more hidden layers

and an output layer of neurons. BPNN has a fully connected architecture where every

node in one layer is connected to every other node in the adjacent layer as given in the

Figure 2.1.

10

Figure 2.1 Simple Back Propagation Neural Network Architecture

Unlike other ANN architectures, BPNN learns by calculating the errors of the

output layer to find the errors in the hidden layers. This makes BPNN highly suitable

for problems in which no relationship is established between the output and the inputs.

Due to its high rate of elasticity and learning abilities, it has been successfully applied

in wide assortment of applications (Nawi at al., 2013). The main objective of the

learning process is to minimize the difference between the actual output Ok and the

desired output tk by adjusting the weights w* in the network optimally. The Error

function is defined as (Gong, 2009);

2

12

1

n

kkk OtE (2.5)

where;

n : number of output nodes in output layer

tk : desired output of the kth output unit

Ok : network output of the kth output unit

The Error function visualized in three-dimensional weight space is given in

Figure 2.2.

11

Figure 2.2 Schematic error functions for a single parameter w with stationary points

For networks with more than one layer, the error function is a non-linear

function of weights and may have several minimum, which satisfies the following

Equation and a 3D visualization is given in Figure 2.2:

0)( wE (2.6)

Where )(wE denotes the gradient of E with respect to weights in Equation

2.6. In the Figure 2.2, the point at which the value of the error function is smallest is

called the global minima at point (a) while all other minima are called local minima.

There may also be other points, which satisfy the conditions in Equation (2.6) for

instance global maxima at point (b) and saddle point at (c). Error is calculated by

comparing the network output with the desired output by using Equation (2.6). The

error signal E is propagated backwards through the network and is used to adjust the

weights. This process continues until the maximum epoch or the target error is

achieved by the network (Rehman and Nawi, 2011).

Since BPNN uses local learning in gradient descent it faces many limitations

such as slow learning or even network stagnancy. Regardless of providing successful

solutions, BPNN is required to carefully select the initial parameters such as network

topology, weights and biases, learning rate, momentum coefficient, activation

function, and value for the gain in the activation function (Nawi, Ransing, and

12

Ransing, 2006; Kolen and Pollack, 1990; Lahmiri, 2011; Rehman and Nawi, 2011;

Zhang and Pu, 2011). An improper use of these parameters can lead to slow network

convergence or even network fiasco. Therefore, several modifications have been

suggested to stop network stagnancy and to speed-up the network convergence to

global minima.

In 1989, Lari-Najafi indicated the use of large initial weights for increasing the

learning rate of the BPNN network. Later, it was found that if the initial weight range

is increased beyond the problem-dependant limit the network’s performance

deteriorates (Lari-Najafi et al., 1989). In 1990, Kolen and Pollack proved the

sensitivity of BPNN to initial weights and suggested the use of weights initialized with

small random values (Kolen and Pollack, 1990). Therefore to make BPNN perform

better, the selection of initial weights is vital and helps speed-up the network

convergence to global minima (Abdul Hamid, 2012; Hyder et al., 2009).

Another BPNN parameter known as momentum coefficient is used to suppress

oscillations in the trajectory by adding a fraction of the previous weight change (Fkirin

et al., 2009). The addition of the momentum coefficient helps to smooth-out the

descent path by avoiding extreme changes in the gradient due to local irregularities

(Rehman and Nawi, 2011b; Sun et al., 2007). Hence, it is vital to suppress any

oscillations that results from the changes in the error surface (Abdul Hamid, 2012). In

the early 90’s, back-propagation with Fixed Momentum (BPFM) showed its prowess

in convergence to global minima but later on it was found that BPFM performs when

the error gradient and the last change in weights are in parallel. When the current

gradient is in an opposing direction to the previous update, BPFM will cause the

weight direction to be updated in the upward direction which leads towards the

network stagnancy or even failure. Hence, it is necessary that the momentum-

coefficient should be adjusted adaptively (Hongmei and Gaofeng, 2009). To overcome

Static Momentum problem various methods for adaptive momentum have been

developed by researchers such as momentum step and a scheme for dynamically

selecting the momentum rate proposed by (Qiu et al., 1992). Yu (1993) rejected the

idea of using one-dimensional error minimization technique affirming that the error

gradient is a very complex non-linear function with respect to the learning rate but it

can be proved that optimal gradient vectors in two successive iteration steps are

orthogonal. This results in the automatic update of momentum in each successive

13

iteration and oscillations are greatly suppressed with reduced error at the end of the

final convergence. In 1994, Swanston, Bishop, & Mitchell proposed Simple Adaptive

Momentum (SAM) for further improving the performance of BPNN (Swanston,

Bishop, and Mitchell, 1994). In SAM, if the change in the weights is in the similar

‘direction’ then the momentum term is increased to accelerate the convergence

otherwise it is decreased. SAM has been found to have lower computational overheads

than the conventional BPNN algorithm and it converged in considerably less iterations.

Later in 2008, Mitchell updated SAM by scaling the momentum after

considering all the weights in each part of the Multi-layer Perceptrons (MLP). This

technique is found helpful in improving convergence speed to the global minima

(Mitchell, 2008). Shao & Zheng (2009) introduced a new Back Propagation

momentum Algorithm (BPAM) with dynamic momentum coefficient. In BPAM,

momentum coefficient was adjusted by combining the information about the current

gradient and the weight change in the earlier phase. When the angle between the

present negative gradient and the last weight change is less than 90°, the momentum

coefficient is defined as a positive value to speed up learning. Otherwise, momentum

is kept zero to guarantee the descent of the error gradient. The new algorithm was

found better than previous algorithms by reducing oscillations in the trajectory (Shao

and Zheng, 2009).

Besides momentum, another parameter that greatly affects the performance of

BPNN is learning rate. A great level of debate has happened on the selection of

learning rate since the inception of BPNN. In the earlier studies, the usual value of

learning rate was kept constant. In 2001, Ye claimed that the constant learning is

unable to answer the search for the optimal weights resulting in the blind search (Ye

2001). To avoid more trials and errors with the network training, Yu & Liu (2002)

introduced back propagation and acceleration learning method (BPALM) with

adaptive momentum and learning rate to answer the problem of fixed learning rate.

Their method was tested on Parity problem, Optical Character Recognition (OCR) and

2-Spirals problem, the results were found to be far superior to any other previous

improvements on BPNN.

More recently, Abdul Hamid (2012) introduced adaptive leaning rate and

momentum to speed-up the convergence rate in conventional BPNN algorithm (Abdul

Hamid, 2012). After the experimentation process, it was concluded that too little

14

learning rate can slow down the network convergence while too big learning rate can

leads the network towards less optimal solutions. So, a learning rate should be selected

very carefully to make the network perform efficiently.

Besides other factors effecting the performance of BPNN, an activation

function represents an output node that is showing some synapses or nothing at all. Its

basic function is to limit the amplitude of the output neuron. It generates an output

value for a node in a predefined range as the closed unit interval [0,1] or alternatively

[-1,1] which can be a linear or non-linear function (Nawi, Ransing, and Ransing, 2006;

Rumelhart, Hinton, and Williams, 1986). In this study, the logistic sigmoid activation

function is used which limits the amplitude of the output in the range of [0,1]. The

activation function for the 𝑗𝑡ℎ node is given in the Equation (2.7);

jnetjacj

eo

,1

1

(2.7)

where,

j

l

iiijjnet owa

1

, (2.8)

where,

jo : output of the thj unit.

io : output of the thi unit.

ijw : weight of the link from unit i to unit j .

jneta , : net input activation function for the thj unit.

j : bias for the thj unit.

jc : gain of the activation function.

In earlier studies the value for gain parameter in the activation function was

kept fixed. But later on, it was realized that the gain parameter can greatly influence

the slope of the activation function. In 1996, a relationship between learning rate,

momentum, and activation function was mapped by Thimm, Moerland, and Fiesler

(1996). In their findings, it was indicted that learning rate and the gain of the activation

function are exchangeable and better results can be obtained with the variable gain

parameter. Thimm’s theory of changing the gain of the activation is equivalent to

15

learning rate, and momentum is further verified by Eom, Jung, and Sirisena (2003)

when they automatically tuned gain parameter with the fuzzy logic. Nawi (2007) used

the adaptive gain parameter in back propagation with conjugate gradient method.

Abdul Hamid (2012) further extended the work by Nawi (2007) and proposed

Adaptive gain parameter with adaptive momentum, and adaptive leaning rate. The

proposed Back Propagation Gradient Descent with Adaptive gain, adaptive

momentum, and adaptive learning (BPGD-AGAMAL) algorithm showed significant

enhancement in the performance of BPNN on classification datasets.

Despite inheriting the most stable multi-layered architecture, BPNN algorithm

is not suitable for dealing with the temporal datasets due to its static mapping

routine(Güler, Übeyli, and Güler, 2005). In-order to use a temporal dataset on BPNN,

all dimensions of the pattern vectors must be equal otherwise BPNN is rendered

useless. However, an alternate approach known as Recurrent Neural Networks (RNN)

is available which can map both temporal and spatial datasets and has short term

memory to remember the past event thus highly influencing the output vectors (Gupta,

McAvoy, and Phegley, 2000; Gupta and McAvoy, 2000; Übeyli, 2008) . The RNN are

discussed in more detail in the next section.

2.3.2 Recurrent Neural Network (RNN)

Unlike the directed acyclic graph formation offered by Multilayer Perceptrons (MLP)

trained with back propagation algorithm, Recurrent Neural Network (RNN) have

diagraph formation. RNN possess the capability to store previous change made to any

node in the network to be utilized in the future, thus making RNN flexible enough to

to understand temporal datasets. Due to this learning elasticity, RNN have been

deployed in several fields such as; simple sequence recognition, Turing machine

learning, pattern recognition, forecasting, optimization, image processing, and

language parsing etc. (Pearlmutter, 1995; Übeyli, 2008; Williams and Zipser, 1989;

Gregor et al., 2014).

Usually RNN are classified as fully recurrent or partially recurrent based on

the functionalities they offer. In the earlier years of ANN’s inception, fully recurrent

neural networks were quite popular. Some of the examples are back propagation

through time (BPTT) and Recurrent back propagation (RBP). The basic principle of

16

BPTT is that of unfolding (Boden, 2001), it is a training method for fully recurrent

network which allows back propagation to train an unfolded feed-forward non-

recurrent version of the original network. Once trained, the weights from any layer of

the unfolded network are passed onto the recurrent network for temporal training

(Gupta et al., 2000; Rumelhart et al., 1986). BPTT is quite inefficient in training long

sequences (Gupta and McAvoy, 2000). Also, error deltas make a big change for each

weight after they are folded back requiring a greater memory requirement. If, a larger

time step is used, it diminishes the error effect called vanishing gradient thus making

it totally infeasible to be applied on any dataset (Boden, 2001; Kolen and Pollack,

1990).

Unlike BPTT, Recurrent Back Propagation (RBP) bears a resemblance to the

master or slave network of Lapedes and Farber, but it is architecturally simple (Pineda,

1987). In RBP network, the back propagation is protracted directly to train fully

recurrent neural network. In this method, all the units are assumed to have continually

evolving states (Gupta, et al., 2000). Pineda (1987) used RBP on temporal XOR with

200 patterns and found it to consume a lot of time. Also, BPTT and RBP are offline

training methods and not suitable for long sequences due to more time consumption.

In 1989, Williams used online training of RNN in which the weights are

updated while the network is running and the error is minimized at the end of each

time step instead of at the end of the sequence. This method allows recurrent networks

to learn tasks that require retention of information over time periods having fixed or

indefinite duration (Williams and Zipser, 1989).

In partial recurrent neural network, recurrence in feed forward neural network

is produced by feeding back the network outputs as additional input units (Jordan et

al., 1991) or delayed hidden unit outputs (Elman, 1990). Also known as Simple

Recurrent Neural Network (SRNN), Elman Recurrent Neural Network (ERNN) is one

of the most popular form of partial RNN. An ERNN network is a relatively simple

structure proposed by Elman to train a network whose connections are largely feed-

forward with a careful selection of feedback context layer’s units to hidden units. The

context layer nodes store the previous inputs to hidden layer’s nodes. The context

values are used as extra inputs to hidden layers, resulting in ERNN ending up with an

open memory of one time delay (Elman, 1990; Güler et al. 2005).

17

Three layered ERNN is used in this research as given in the Figure 2.3. In

ERNN, each layer has its own index variable: 𝑘 for output nodes, 𝑗 and h for hidden,

and 𝑖 for input nodes. In a feed-forward network, the input vector 𝑥 is propagated

through a weight layer 𝑉.

Figure 2.3 An Elman Recurrent Neural Network (Boden, 2001)

𝑛𝑒𝑡𝑗(𝑡) = ∑ 𝑥𝑖(𝑡)𝑉𝑗𝑖 + 𝜃𝑗𝑛𝑖 (2.9)

Where, 𝑛 is the number of inputs, and 𝜃𝑗 is the bias. In an ERNN, the input

vector is spread in a similar manner like feed-forward networks propagate through a

layer with some weights. But in RNN, the input vector is combined with the previous

state activation through an additional recurrent weight layer, 𝑈;

𝑦𝑗(𝑡) = 𝑓(𝑛𝑒𝑡𝑗(𝑡)) (2.10)

𝑛𝑒𝑡𝑗(𝑡) = ∑ 𝑥𝑖(𝑡)𝑉𝑗𝑖 + ∑ 𝑦ℎ(𝑡 − 1)𝑈𝑗ℎ + 𝜃𝑗𝑚𝑙

𝑛𝑖 (2.11)

Where, 𝑓 is an output function and m is the number of states. The output of the

network is achieved through the current state and the output weights,𝑊;

18

𝑦𝑘(𝑡) = 𝑔(𝑛𝑒𝑡𝑘(𝑡)) (2.12)

𝑛𝑒𝑡𝑘(𝑡) = ∑ 𝑦𝑗(𝑡 − 1)𝑊𝑘𝑗 + 𝜃𝑘𝑚𝑗 (2.13)

Where, 𝑔 is an output function, similar to 𝑓 and 𝑊𝑘𝑗 represents the weights from

hidden to output layer.

In the early 1990’s, ERNN has been found to have a sufficient generalization

capability and has successfully predicted the stock points in Tokyo stock exchange

(Kamijo and Tanigawa, 1990). ERNN also takes advantage of the parallel hardware

architecture, and it has shown faster capability to learn complex patterns such as

natural language processing (Elman, 1991), and time series data classification (Husken

and Stagge, 2003). In medical field, it is found beneficial in dynamic mapping of the

electroencephalographic (EEG) signals classification with high accuracy during

clinical trials (Güler et al., 2005).

Later, a similar ERNN technique was used for Doppler ultrasound signal

classification using Lyapunov exponents and again high accuracy was achieved

(Übeyli, 2008). Based on the optimization provided by ERNN, Xing (2015) has

recently applied ERNN to solve real time price estimation problems in the power grid

with great success (He et al., 2015). Despite all these achievements ERNN algorithms

face the initial weight dilemma and gets stuck in local minima or slow convergence.

In-order to avoid local minima and slow convergence in ANN, a second order

derivative based Levenberg-Marquardt (LM) algorithm was introduced (Levenberg,

1944; Marquardt, 1963).

2.3.3 Levenberg-Marquardt (LM) Algorithm

The steepest descent method also known as BPNN algorithm has its pros but it also

has a problem of slow convergence. The slow convergence of the BPNN can be

significantly enhanced by the Gauss-Newton (GN) algorithm. The Gauss-Newton

algorithm using adequate step sizes for each direction can converge to global minima

efficiently. In case, if the error function has a quadratic surface, it can converge in a

single epoch. But this phenomena can only occur when the quadratic approximation

19

of error surface is realistic, otherwise, the GN algorithm would be mostly divergent

(Nawi et al., 2011).

Therefore an intermediary algorithm that utilizes the gradient descent and GN

methods is introduced. The algorithm best known as Levenberg-Marquardt (LM) is

more robust than the GN method, because in many cases it can converge even if the

error surface is more complex than the quadratic situation (Levenberg, 1944;

Marquardt, 1963). The elementary inkling of the Levenberg-Marquardt is that it shifts

to the steepest descent algorithm, until the local curvature is proper to make a quadratic

approximation; then it approximately becomes the Gauss–Newton algorithm, which

can speed up the convergence significantly (Yu and Wilamowski, 2012). LM uses

Hessian matrix for approximation of error surface. Assume the error function is:

𝐸(𝑡) =1

2∑ 𝑒𝑖

2(𝑡)𝑁𝑖=1 (2.14)

Where,

𝑒(𝑡): is the error, and

𝑁: is the number of vector elements, then;

∇𝐸(𝑡) = 𝐽𝑇(𝑡)𝑒(𝑡) (2.15)

∇2𝐸(𝑡) = 𝐽𝑇(𝑡)𝐽(𝑡) (2.16)

Where,

∇𝐸(𝑡): is the gradient descent,

∇2𝐸(𝑡): is the Hessian matrix of E (t), and

𝐽 (𝑡): is Jacobian matrix

𝐽 (𝑡) =

[ 𝜕𝑣1(𝑡)

𝜕𝑡1 𝜕𝑣1(𝑡)

𝜕𝑡2… . .

𝜕𝑣1(𝑡)

𝜕𝑡𝑛

𝜕𝑣2(𝑡)

𝜕𝑡1 𝜕𝑣2(𝑡)

𝜕𝑡2… . .

𝜕𝑣2(𝑡)

𝜕𝑡𝑛...

𝜕𝑣𝑛(𝑡)

𝜕𝑡1 𝜕𝑣𝑛(𝑡)

𝜕𝑡2… . .

𝜕𝑣𝑛(𝑡)

𝜕𝑡𝑛 ]

(2.17)

For Gauss-Newton (GN) method;

20

∇𝑤 = −[𝐽𝑇(𝑡)𝐽(𝑡)]−1𝐽(𝑡)𝑒(𝑡) (2.18)

For the Levenberg-Marquardt algorithm as the variation of Gauss-Newton

Method;

𝑤(𝑘 + 1) = 𝑤(𝑘) − [𝐽𝑇(𝑡)𝐽(𝑡) + 𝜇𝐼]−1𝐽(𝑡)𝑒(𝑡) (2.19)

Where 𝜇 > 0 and is a constant; 𝐼 is identity matrix. The algorithm will

approach the Gauss- Newton which ought to deliver rapid convergence to global

minima. Also, it should be kept in mind that when parameter λ is large, the Equation

(2.19) approaches gradient descent (with learning rate 1/λ) while for a small λ, the

algorithm approaches the Gauss- Newton method.

Although, LM possesses both the speed of the Gauss-Newton and the stability

of the BPNN methods. But it has its limitations, one limitation is that the inverse of

Hessian matrix needs to be calculated each time for weight update and this inversion

may be repeated many times in a single epoch. Therefore, LM computation is efficient

for small sized datasets. But for large datasets, such as image recognition datasets, LM

may render itself useless as the Hessian inversion will be a CPU overhead. Another

problem is that the Jacobian matrix has to be stored for computation, and its size is P

× M × N, where P is the number of patterns, M is the number of outputs, and N is the

number of weights. For large-sized training patterns, the memory cost for Jacobian

matrix storage may be too huge to be practical. Also, for well-behaved functions and

reasonable starting parameters, the LM tends to be a bit slower than the GN and has a

high tendency towards convergence to local minima (Wilamowski et al., 2007).

In 1994, The Marquardt algorithm for nonlinear least squares was presented

and later incorporated into the back propagation for training feed-forward neural

networks. The algorithm was tested on function approximation problems, and

benchmarked against the conjugate gradient algorithm and a variable learning rate

algorithm. It was found during the simulations that the Marquardt algorithm was more

efficient than any other techniques when the network weights are limited to a few

hundred (Hagan and Menhaj 1994).

21

In 2002, Ampazis and Perantonis presented two second-order algorithms for

the training of feed-forward neural networks. The Levenberg Marquardt (LM) method

used for nonlinear least squares problems incorporated an additional adaptive

momentum term. The simulation results on large scale datasets show that their

implementation models had better success rate than the conventional LM and other

gradient descent methods (Ampazis and Perantonis, 2002). Later in 2005, Kermani

implemented LM algorithm to determine the sensation of smell through the use of an

electronic nose. Their research showed that the LM algorithm is a suitable choice for

odor classification and it performs better than the old BP algorithm (Kermani et al.,

2005).

Wilamowski et al. (2007) optimized the LM algorithm by calculating the

Quasi-Hessian matrix and gradient vector directly, thus eliminating the need for

storing the Jacobian matrix as it was replaced with a vector operation. The removal of

Jacobian Matrix caused less memory overheads during simulations on large datasets.

The simulation results found that this unconventional LM algorithm can perform better

than the simple LM with less memory and CPU overheads (Wilamowski et al., 2007;

H. Yu and Wilamowski, 2012). In recent years, several new LM modifications are

proposed which will be discussed in more details in the Section 2.7.

Recently, metaheuristics belonging to the class of Swarm Intelligence have

become quite popular due to their flexibility in providing derivative free solutions to

complex problems. The Swarm Intelligent Metaheuristic algorithms are discussed in

the next section.

2.4 Swarm Intelligent Metaheuristics

Swarm Intelligence is the collective behaviour of decentralized, self-organized

systems, either natural or artificial. In 1989, Beny coined the term Swarm intelligence

(Beni and Wang, 1989). Since then Swarm intelligence has become the basis of many

nature inspired metaheuristic search algorithms. Meta means ‘to look beyond’ or

‘higher level’ and heuristic means ‘to find’ or ‘to discover by trial and error’. In short,

swarm intelligent metaheuristics can be described as high level approaches for

exploring search spaces by using different methods (Blum et al., 2008).

22

A metaheuristic optimization method is a heuristic strategy for probing the

search space of an ultimately global optimum in a more or less intelligent way (Gilli

and Winker, 2008). This is also known as a stochastic optimization. A stochastic

optimization is grounded in the belief that a stochastic, high quality approximation of

a global optimum obtained at the best effort will probably be more valuable than a

deterministic, poor quality local minima provided by a classical method or no solution

at all. Incrementally, it optimizes a problem by attempting to improve the candidate

solution with respect to a given measure of quality defined by a fitness function. It first

generates a candidate solution 𝑥𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 and as long as the stopping criteria are not

met, it checks its neighbours against the current solution (𝑆𝑒𝑙𝑒𝑐𝑡 𝑥𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟 ∈

ℕ(𝑥𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒)). The candidate solution is updated with its neighbour; if, it is

better (𝐼𝐹 𝑓(𝑥𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟) < 𝑓(𝑥𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒)𝑇𝐻𝐸𝑁 𝑥𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 = 𝑥𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒), such that the

global optimum at the end is 𝑥𝑜𝑝𝑡 = 𝑥𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 (Tang et al., 2012). As such,

metaheuristic optimization algorithms are often based on local search methods in

which the solution space is not explored systematically or exhaustively, but rather a

particular heuristic is characterized by the manner in which the exploration through

the solution space is organized.

Some current examples of metaheuristics are Particle Swarm Optimization

(PSO) which has been successfully applied in problems of electro-magnetics

(Robinson and Rahmat-Samii, 2004) and antenna design (Jin and Rahmat-Samii,

2007). Ant Colony Optimization (ACO) algorithms are also used in many areas of

optimization (Merkle et al., 2002; Parpinelli and Lopes, 2011). Artificial Bee Colony

(ABC) showed good performance in numerical optimization (Karaboga and Basturk,

2007; Karaboga and Basturk, 2008), in large-scale global optimization (Fister, Jr, and

Zumer 2012), and also in combinatorial optimization (Fister Jr, Fister, and Brest, 2012;

Neri and Tirronen, 2009; Pan et al., 2011; Parpinelli and Lopes, 2011). Recently, a

new set of metaheuristics are added to the family of age long swarm intelligent

algorithms. These bio-inspired algorithms include Firefly (Yang, 2009), Cuckoo

Search (Yang and Deb, 2009), Wolf Search (Tang et al., 2012) and Bat (Yang, 2010a).

These metaheuristic optimization algorithms have search methods both in breath and

in depth that are largely based on the swarm movement patterns of animals and insects

found in the nature. Their performance in metaheuristic optimizations have proven

23

superior to that of many classical metaheuristics such as genetic algorithms (Goldberg,

1989) and particle swarm optimization (PSO) (Kennedy and Eberhart, 1995).

The main components of any metaheuristic search algorithm are exploration

and exploitation. Exploration in metaheuristic algorithm is accomplished through the

use of randomization provided by random walks to search much larger search space in

the hope of finding more promising solutions. Exploration provides diversification

which helps an algorithm to search globally and avoid local optima. On the other hand,

exploitation process provides intensification in which new neighbourhood solutions

are traversed locally to find a better solution than the already found optimal one (Neri

and Tirronen, 2009; Yang et al., 2014). A review of the working process of the

algorithms used in this research in-terms of exploration and exploitation are discussed

in the remaining section.

Genetic Algorithm (GA) is a metaheuristic optimization algorithm that imitates

the natural selection process while searching for the optimal solution (Holland, 1973;

Goldberg, 1989). It is one of the oldest evolutionary search algorithm inspired by the

natural evolution process, such as; mutation, selection, and crossover etc. In GA,

number of solutions are considered as genomes or chromosomes. On the parent

solutions, at each time-step the GA usually performs mutation and crossover to

ultimately find the most optimal chromosome by exploring the solution. Meanwhile,

the selection process helps in finding the fit individuals to transfer their information to

the next generation in the evolutionary process; thus increasing the exploitation

process in GA (Hansheng and Lishan, 1999).

Simulated Annealing (SA) is metaheuristic algorithm for finding an optimal

solution for a stochastic problem. Proposed by Kirkpatrick, Gelett, and Vecchi (1983)

and improved by Cerny (1985), this algorithm is inspired by the metallurgic process

in which the metal is heated and cooled in a controlled manner to increase the

durability of metal casting in the foundry (Černý 1985; Kirkpatrick et al., 1983). Only

slow cooling process of metallurgy is implemented in SA with temperature as the main

component for exploration and exploitation, so that SA will move from worse

solutions to a final optimal one on the basis of probability of states with a minimum

energy configuration (Bertsimas and Tsitsiklis, 1993).

Particle Swarm Optimization (PSO) is a population based stochastic

optimization algorithm. Proposed by Kennedy and Eberhart in 1995, this algorithm is

24

based on the social behavior of bird flocking or fish schooling where each fish or bird

is considered a particle (Kennedy and Eberhart, 1995). Like other evolutionary

algorithms, these particles fly with a certain velocity to find the global best 𝑔𝑏𝑒𝑠𝑡

solution after traversing through several local best solutions in each iteration. It has

been found highly efficient in solving several optimization problems such as;

electromagnetics (Ciuprina et al., 2002), unsupervised robotic learning (Pugh,

Martinoli, and Zhang, 2005), optimization of tile manufacturing process (Navalertporn

and Afzulpurkar, 2011), and wireless sensor networks (Kulkarni and

Venayagamoorthy, 2011) etc.

Since its origin in 2001, Harmony Search (HS) has been used extensively to

solve many optimization problems such as vehicle routing (Geem, Lee, and Park,

2005), water distribution networks (Geem, 2006), numerical optimization (Karaboga

and Akay, 2009), and University course time tabling (Al-Betar and Khader, 2012) etc.

Proposed by Zong Woo Geem in 2001, HS algorithm is a metaheuristic algorithm

based on the harmonic motion of sounds or melodies that human ears find pleasant to

hear. This algorithm’s basic goal is to find an optimal solution just like a musician

produces a music note with perfect harmony (Geem et al., 2001). Harmony search

utilizes three idealized rules based on the improvisation process of a musician, which

are; harmony memory, pitch adjustment, and randomization (Yang, 2009). These rules

are explained as follows;

a) HS memory is similar to the best fit individuals in the GA and the best harmony

memory is carried over to the next harmony memory. Harmony memory is

assigned a parameter known as 𝑟𝑎𝑐𝑐𝑒𝑝𝑡 ∈ [0,1], called acceptance rate.

Acceptance rate is neither kept too low nor too high, as it might leads to

potentially less optimal solutions during exploitation process.

b) The second component of HS is the pitch adjustment rate controlled by pitch

bandwidth 𝑏𝑟𝑎𝑛𝑔𝑒, and pitch adjusting rate 𝑟𝑝𝑎. In music pitch adjustment is

done to change frequencies but in HS it is used to generate change in the

solution. Usually it is linearly adjusted to get;

𝑥𝑛𝑒𝑤 = 𝑥𝑜𝑙𝑑 + 𝑏𝑟𝑎𝑛𝑔𝑒 ∗ 𝜀 (2.20)

REFERENCES

Ab Aziz, M. F., Shamsuddin, S. M. and Alwee, R. (2009). Enhancement of Particle

Swarm Optimization in Elman Recurrent Network with Bounded Vmax

Function. 3rd Asia International Conference on Modelling and Simulation, AMS

2009, 125–30.

Abdul Hamid, N. (2012). THE EFFECT OF ADAPTIVE PARAMETERS ON THE

PERFORMANCE OF BACK PROPAGATION. Universiti Tun Hussein Onn

Malaysia. Master Thesis.

Ackley, D. H. (1987). An Empirical Study of Bit Vector Function Optimization.

Genetic algorithms and simulated annealing, vol. 1, 170–204.

Afrabandpey, H., Ghaffari, M., Mirzaei, A., & Safayani, M. (2014). A Novel Bat

Algorithm Based on Chaos for Optimization Tasks. 2014 Iranian Conference on

Intelligent Systems (ICIS), Iran, IEEE. 2–7.

Al-Betar, M. A., and Khader, A. H. (2012). A Harmony Search Algorithm for

University Course Timetabling. Annals of Operations Research, vol. 194(1), 3–

31.

Alihodzic, A., and Tuba, M. (2014a). Improved Bat Algorithm Applied to Multilevel

Image Thresholding. The Scientific World Journal, vol. 2014.

Alihodzic, A., and Tuba, M. (2014b). Improved Hybridized Bat Algorithm for Global

Numerical Optimization. 2014 UKSim-AMSS 16th International Conference on

Computer Modelling and Simulation, IEEE, 57–62.

Ampazis, N., and Perantonis, S. J. (2002). Two Highly Efficient Second-Order

Algorithms for Training Feedforward Networks. IEEE Transactions on Neural

Networks, vol. 13(5), 1064–74.

Bahmani-Firouzi, B, and Azizipanah-Abarghooee, R. (2014). Optimal Sizing of

218

Battery Energy Storage for Micro-Grid Operation Management Using a New

Improved Bat Algorithm. International Journal of Electrical Power & Energy

Systems, vol. 56, 42–54.

Beni, G., and Wang, J. (1993). Swarm Intelligence in Cellular Robotic Systems. In

Robots and Biological Systems: Towards a New Bionics? NATO ASI Series, vol.

102, 703–12.

Berman, S. M. (1971). Mathematical Statistics: An Introduction Based on the Normal

Distribution. Intext Educational Publishers.

Bertsimas, D., and John, T. (1993). Simulated Annealing. Statistical Science vol. 8(1),

10–15.

Biswal, S., Barisal, A. K., Behera, A. and Prakash, T. (2013). Optimal Power Dispatch

Using BAT Algorithm. 2013 International Conference on Energy Efficient

Technologies for Sustainability, ICEETS 2013, 1018–23.

Blum, C. (2008). Hybrid Metaheuristics: An Emerging Approach to Optimization.

Springer Berlin Heidelberg.

Blum, C., and Roli, A. (2003). Metaheuristics in Combinatorial Optimization:

Overview and Conceptual Comparison. ACM Computing Surveys, vol. 35, 268–

308.

Boden, M. (2001). A Guide to Recurrent Neural Networks and Backpropagation.

Electrical Engineering, (2), 1–10.

Bohachevsky Function (2015). www-optima.amp.i.kyoto-

u.ac.jp/member/student/hedar/Hedar_files/TestGO_files/Page595.htm (July 4,

2015).

Brahim-Belhouari, S., and Bermak, A. (2004). Gaussian Process for Nonstationary

Time Series Prediction. Computational Statistics & Data Analysis, vol. 47(4),

705–12.

Černý, V. (1985). Thermodynamical Approach to the Traveling Salesman Problem:

An Efficient Simulation Algorithm. Journal of Optimization Theory and

Applications, vol. 45(1), 41–51.

Chou, P., and Chen, J. (2011). Enforced Mutation to Enhancing the Capability of

Particle Swarm Optimization Algorithms. Lecture Notes in Computer Science,

219

Springer, vol. 6728, 28–37.

Ciuprina, G., Ioan, D. and Munteanu, I. (2002). Use of Intelligent-Particle Swarm

Optimization in Electromagnetics. IEEE Transactions on Magnetics, vol. 38 (21),

1037–1040.

Collignan, A., Pailhes, J., and Sebastian, P. (2011). Design Optimization: Management

of Large Solution Spaces and Optimization Algorithm Selection. In IMProVe,

Venice.

Davis, R. A. 2007. Gaussian Processes. Encyclopedia of Environmetrics, vol. 3, 1–13.

Dwinell, W. (2007). AUC. http://matlabdatamining.blogspot.my/2007/06/roc-curves-

and-auc.html (April 15, 2016).

Elman, J L. (1990). Finding Structure in Time. Cognitive science, vol. 14(2), 179–211.

Elman, J. L. (1991). Distributed Representations, Simple Recurrent Networks, and

Grammatical Structure. Machine Learning, vol. 7(2-3), 195–225.

Eom, K., Jung, K. and Sirisena, H. (2003). Performance Improvement of

Backpropagation Algorithm by Automatic Activation Function Gain Tuning

Using Fuzzy Logic. Neurocomputing, vol. 50: 439–60.

Evett, I. W., and Spiehler, E. J. (1987). Rule Induction in Forensic Science. KBS in

Government, 107–118.

Fawcett, T. (2004). ROC Graphs : Notes and Practical Considerations for Researchers.

ReCALL, vol. 31(HPL-2003-4), 1–38.

Fawcett, T. (2006). An Introduction to ROC Analysis. Pattern Recognition Letters,

vol. 27(2006), 861–74.

Fisher, R. A. (1936). The Use of Multiple Measurements in Taxonomic Problems.

Annals of Eugenics, vol. 7(2), 179–88.

Fister Jr, I., Fister, I., and Brest, J. (2012). A Hybrid Artificial Bee Colony Algorithm

for Graph 3-Coloring. Swarm and Evolutionary Computation, 66–74.

Fister, I., Fister, D., and Yang, X. S. (2013). A Hybrid Bat Algorithm. Elektrotehniski

Vestnik/Electrotechnical Review, vol. 80, 1–7.

Fister, I., Fong, S., and Brest, J. (2014). A Novel Hybrid Self-Adaptive Bat Algorithm.

The Scientific World Journal, vol. 2014(i).

220

Fister, I., Fister Jr, I. and Zumer, J. (2012). Memetic Artificial Bee Colony Algorithm

for Large-Scale Global Optimization. IEEE Congress on Evolutionary

Computation (CEC),.

Fkirin, M. A., Badwai, S. M., and Mohamed, S. A. (2009). Change Detection Using

Neural Network in Toshka Area. 2009 National Radio Science Conference.

Gandomi, A. H., and Yang, X. S. (2014). Chaotic Bat Algorithm. Journal of

Computational Science, vol. 5, 224–32.

Geem, Z. W. , Kim, J. H. and Loganathan, G. V. (2001). A New Heuristic Optimization

Algorithm: Harmony Search. Simulation, vol. 76, 60–68.

Geem, Z. W. (2006). Optimal Cost Design of Water Distribution Networks Using

Harmony Search. Engineering Optimization, vol. 38(3), 259–77.

Geem, Z. W., Lee, K. S., and Park, Y. (2005). Application of Harmony Search to

Vehicle Routing. American Journal of Applied Sciences, vol. 2(12), 1552–1557.

Ghosh, A., and Chakraborty, M. (2012). Hybrid Optimized Back Propagation

Learning Algorithm For Multi-Layer Perceptron. International Journal of

Computer Applications, vol. 57(December), 1–6.

Gilli, M., and Winker, P. (2008). A Review of Heuristic Optimization Methods in

Econometrics. Swiss Finance Institute Research, 08–12.

Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization, and Machine

Learning. Addison Wesley.

Gong, B. (2009). A Novel Learning Algorithm of Back-Propagation Neural Network.

IITA International Conference on Control, Automation and Systems Engineering,

(CASE 2009), 411–414.

Gonzalez, R. C., and Woods, R. E. (2008). Digital Image Processing. Prentice Hall.

Gregor, K., Danihelka, I., Graves, A., and Wierstra, D. (2014). DRAW: A Recurrent

Neural Network For Image Generation.

Griewank, A. O. (1981). Generalized Descent for Global Optimization. Journal of

Optimization Theory and Applications, vol. 34, 11–39.

Güler, N. F., Übeyli, E. D. and Güler, I. (2005). Recurrent Neural Networks

Employing Lyapunov Exponents for EEG Signals Classification. Expert Systems

with Applications, vol. 29(3), 506–14.

221

Gupta, L., and McAvoy, M. (2000). Investigating the Prediction Capabilities of the

Simple Recurrent Neural Network on Real Temporal Sequences. Pattern

Recognition, vol. 33(12), 2075–2081.

Gupta, L., McAvoy, M. and Phegley, J. (2000). Classification of Temporal Sequences

via Prediction Using the Simple Recurrent Neural Network. Pattern Recognition,

vol. 33(10), 1759–1770.

Hagan, M. T., and Menhaj, M. B. (1994). Training Feedforward Networks with the

Marquardt Algorithm. IEEE Transactions on Neural Networks, vol. 5(6), 989–

93.

Hale, D. (2006). Recursive Gaussian Filters. Proceedings of XVII IMEKO World

Congress.

Han, J., and Kamber, M. (2006). Data Mining: Concepts and Techniques. Soft

Computing, vol. 54.

Hansheng, L., and Lishan, K. (1999). Balance between Exploration and Exploitation

in Genetic Search. Wuhan University Journal of Natural Sciences, vo. 4(1), 28–

32.

Hasançebi, O., and Carbas, S. (2014). Bat Inspired Algorithm for Discrete Size

Optimization of Steel Frames. Advances in Engineering Software, vol. 67, 173–

85.

Hasançebi, O., Teke, T. and Pekcan, O. (2013). A Bat-Inspired Algorithm for

Structural Optimization. Computers and Structures, vol. 128, 77–90.

He, X. (2015). A Recurrent Neural Network for Optimal Real-Time Price in Smart

Grid. Neurocomputing, vol. 149, 608–612.

He, X. S., Ding, W. J., and Yang, X. S. (2013). Bat Algorithm Based on Simulated

Annealing and Gaussian Perturbations. Neural Computing and Applications, vol.

25(2), 1–10.

Ho, Y., Bryson, A. and Baron, S. (1965). Differential Games and Optimal Pursuit-

Evasion Strategies. IEEE Transactions on Automatic Control, vol. 10(4).

Holland, J. H. (1973). Genetic Algorithms and the Optimal Allocation of Trials. SIAM

Journal on Computing, vol. 2(2), 88–105.

Husken, M., and Stagge, P. (2003). Recurrent Neural Networks for Time Series

222

Classification. Neurocomputing, vol. 50, 223–35.

Hyder, M. M., Shahid, M. I., Kashem, M. A., and Islam, M. S. (2009). Initial Weight

Determination of a MLP for Faster Convergence. Journal of Electronics and

Computer Science, vol. 10.

Jacobs, R. A., Jordan, M. I., Nowlan, S. J., and Hinton, G. E. (1991). Adaptive

Mixtures of Local Experts. Neural Computation, vol. 3(1), 79–87.

Jin, N., and Rahmat-Samii, Y. (2007). Advances in Particle Swarm Optimization for

Antenna Designs: Real-Number, Binary, Single-Objective and Multiobjective

Implementations. IEEE Transactions on Antennas and Propagation, vol. 55,

556–567.

Jong, D., and Alan, K. (1975). Analysis of the Behavior of a Class of Genetic Adaptive

Systems. University of Michigan.

Jr, I. F., and Yang, X. S. (2013). A Hybrid Bat Algorithm. Elektrotehniski

Vestnik/Electrotechnical Review, vol. 80(2), 1–7.

Kabir, W., Sakib, N. Chowdhury, S. M. R, and Alam, M. S. (2014). A Novel Adaptive

Bat Algorithm to Control Explorations and Exploitations for Continuous

Optimization Problems. International Journal of Computer Applications, vol.

94(13), 15–20.

Kamijo, K., and Tanigawa, T. (1990). Stock Price Pattern Recognition- A Recurrent

Neural Network Approach. International Joint Conference on Neural Networks,

215–221.

Karaboga, D. (2005). An Idea Based on Honey Bee Swarm for Numerical

Optimization. Technical Report TR06, Erciyes University (TR06).

Karaboga, D, and Akay, B. (2009). Artificial Bee Colony (ABC), Harmony Search

and Bees Algorithms on Numerical Optimization. Proceedings of Innovative

Production Machines and Systems Virtual Conference, IPROMS, 1–6.

Karaboga, D., and Basturk, B. (2008). On the Performance of Artificial Bee Colony

(ABC) Algorithm. Applied Soft Computing Journal, vol. 8, 687–97.

Karaboga, D., and Akay, B. (2009). A Comparative Study of Artificial Bee Colony

Algorithm. Applied Mathematics and Computation, vol. 214, 108–132.

Karaboga, D., Akay, B. and Ozturk, C. (2007). Artificial Bee Colony (ABC)

223

Optimization Algorithm for Training Feed-Forward Neural Networks. Modeling

Decisions for Artificial Intelligence, Springer Berlin Heidelberg, 318–29.

Karaboga, D., and Basturk, B. (2007). A Powerful and Efficient Algorithm for

Numerical Function Optimization: Artificial Bee Colony (ABC) Algorithm.

Journal of Global Optimization, vol. 39, 459–71.

Kennedy, J., and Eberhart, R. (1995a). Particle Swarm Optimization. IEEE

International Conference on Neural Networks, 1942–1948.

Kennedy, J., and Eberhart, R. (1995b). Particle Swarm Optimization. Proceedings of

ICNN’95 - International Conference on Neural Networks 4.

Kermani, B. G., Schiffman, S. S. and Nagle, H. T. (2005). Performance of the

Levenberg–Marquardt Neural Network Training Method in Electronic Nose

Applications. Sensors and Actuators B: Chemical, vol. 110(1), 13–22.

Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. (1983). Optimization by Simulated

Annealing. Science (New York, N.Y.), vol. 220, 671–80.

Kolen, J. F., and Pollack, J. B. (1990). Back Propagation Is Sensitive to Initial

Conditions. Complex Systems, vol. 4(3), 269–80.

Kotsiantis, S. B., and Kanellopoulos, D. (2006). Data Preprocessing for Supervised

Leaning. International Journal of Computer Science, vol. 1(2), 1–7.

Kulkarni, R. V., and Venayagamoorthy, G. K. (2011). Particle Swarm Optimization in

Wireless-Sensor Networks: A Brief Survey. IEEE Transactions on Systems, Man

and Cybernetics Part C: Applications and Reviews, vol. 41(2), 262–267.

Lahmiri, S. 2011. A Comparative Study of Back Propagation Algorithms in Financial

Prediction. International Journal of Computer Science, Engineering and

Applications (IJCSEA), vol. 1(4).

Lari-Najafi, H., Nasiruddin, M., and Samad, T. (1989). Effect of Initial Weights on

Back-Propagation and Its Variations. IEEE International Conference on Systems,

Man and Cybernetics.

Lawler, G. F., and Limic, V. (2010). Random Walk : A Modern Introduction. Science,

vol.123 , 1–289.

Levenberg, K. 1944. A Method for the Solution of Certain Problems in Least Squares.

Quart. Appl. Math, vol. 2, 164–68.

224

Lin, J. H., Chao-Wei, C., Chorng-Horng, Y., and Hsien-Leing, T. (2012). A Chaotic

Levy Flight Bat Algorithm for Parameter Estimation in Nonlinear Dynamic

Biological Systems. Journal of Computer and Information Technology, vol. 2(2),

56–63.

Ling, C. X., Huang, J., and Zhang, H. (2003). AUC: A Statistically Consistent and

More Discriminating Measure than Accuracy. In IJCAI International Joint

Conference on Artificial Intelligence, 519–24.

Lyon, A. (2014). Why Are Normal Distributions Normal? British Journal for the

Philosophy of Science, vol. 65(3), 621–49.

Malioutov, D. M., Johnson, J. K., and Willsky, A. S. (2006). Walk-Sums and Belief

Propagation in Gaussian Graphical Models. Journal of Machine Learning

Research, vol. 7(7), 2031–64.

Marquardt, D. W. (1963). An Algorithm for Least-Squares Estimation of Nonlinear

Parameters. Journal of the Society for Industrial and Applied Mathematics, vol.

11, 431–41.

Meng, X. B., Gao, X. Z., and Liu, Y. (2015). A Novel Hybrid Bat Algorithm with

Differential Evolution. International Journal of Hybrid Information Technology,

vol. 8(1), 383–396.

Merkle, D., Middendorf, M., and Schmeck, H. (2002). Ant Colony Optimization for

Resource-Constrained Project Scheduling. IEEE Transactions on Evolutionary

Computation, vol.6 (4), 333-346.

Mirjalili, S., Mirjalili, S. M., and Yang, X. S. (2013). Binary Bat Algorithm. Neural

Computing and Applications, vol. 2013, 1–19.

Mirjalili, S., Hashim, S. Z. M., and Sardroudi, H. M. (2012). Training Feedforward

Neural Networks Using Hybrid Particle Swarm Optimization and Gravitational

Search Algorithm. Applied Mathematics and Computation, 218(22), 11125–

11137.

Mishra, S., Shaw, K., and Mishra, D. (2012). A New Meta-Heuristic Bat Inspired

Classification Approach for Microarray Data. Procedia Technology, vol. 4, 802–

816.

Mitchell, R. I. (2008). On Simple Adaptive Momentum. 7th IEEE International

225

Conference on Cybernetic Intelligent Systems, CIS 2008.

Navalertporn, T., and Afzulpurkar, N. V. (2011). Optimization of Tile Manufacturing

Process Using Particle Swarm Optimization. Swarm and Evolutionary

Computation, vol. 1(2), 97–109.

Nawi, N. M., Khan, A. and Rehman, M. Z. (2013). A New Optimized Cuckoo Search

Recurrent Neural Network (CSRNN) Algorithm. The 8th International

Conference on Robotic, Vision, Signal Processing & Power Applications,

Penang: Springer Singapore, 335–341.

Nawi, N. M., Ransing, R. S., and Ransing, M. R. (2008). A New Method to Improve

the Gradient Based Search Direction to Enhance the Computational Efficiency of

Back Propagation Based Neural Network Algorithms. 2nd Asia International

Conference on Modelling and Simulation, AMS 2008, 546–52.

Nawi, N. M., Rehman, M. Z. and Ghazali, M. I. (2011). Noise-Induced Hearing Loss

Prediction in Malaysian Industrial Workers Using Gradient Descent with

Adaptive Momentum Algorithm. International Review on Computers and

Software (IRECOS), vol. 6(5).

Nawi, N. M., Ghazali, R. and Mohd Salleh. M. N. (2011). Predicting Patients with

Heart Disease by Using an Improved Back-Propagation Algorithm. JOURNAL

OF COMPUTING, vol. 3, 53–58.

Nawi, N. M., Ransing, M. R. and Ransing, R. S. (2006). An Improved Learning

Algorithm Based on the Broyden-Fletcher-GoldfarbShanno (BFGS) Method for

Back Propagation Neural Networks. Sixth International Conference on Intelligent

Systems Design and Applications, 152–157.

Nawi, N. M. (2007). Computational Issues in Process Optimsation Using Historical

Data. Swansea University, PhD Thesis.

Nawi, N. M., Khan, A., and Rehman M. Z. (2013). A New Cuckoo Search Based

Levenberg-Marquardt (CSLM) Algorithm. Computational Science and Its

Applications – ICCSA 2013, Springer Berlin Heidelberg, 438–451.

Nawi, N. M., Rehman, M. Z. and Khan, A. (2014). A New Bat Based Back-

Propagation (BAT-BP) Algorithm. Advances in Systems Science, Springer

International Publishing, 395–404.

226

Neri, F., and Tirronen, V. (2009). Recent Advances in Differential Evolution: A

Survey and Experimental Analysis. Artificial Intelligence Review, vol. 33, 61–

106.

Ozturk, C., and Karaboga, D. (2011). Hybrid Artificial Bee Colony Algorithm for

Neural Network Training. 2011 IEEE Congress of Evolutionary Computation

(CEC), 84–88.

Pan, Q. , Tasgetiren, M. F., Suganthan, P. N., and Chua, T. J. (2011). A Discrete

Artificial Bee Colony Algorithm for the Lot-Streaming Flow Shop Scheduling

Problem. Information Sciences, vol. 181, 2455–2468.

Parpinelli, R. S., and Lopes, H. S. (2011). New Inspirations in Swarm Intelligence: A

Survey. International Journal of Bio-Inspired Computation, vol. 3(1).

Parpinelli, R. S, Lopes, H. S., and Freitas, A. A. (2002). Data Mining with an Ant

Colony Optimization Algorithm. IEEE Transactions on Evolutionary

Computation, vol. 6, 321–332.

Pearlmutter, B. (1995). Gradient Calculation for Dynamic Recurrent Neural Networks:

A Survey. IEEE Trans on Neural Networks, vol. 6(5), 1212.

Pineda, F. J. (1987). Generalization of Back-Propagation to Recurrent Neural

Networks. Physical Review Letters, vol. 59(19), 2229–2232.

Pugh, J., Martinoli, A., and Zhang, Y. (2005). Particle Swarm Optimization for

Unsupervised Robotic Learning. 2005 IEEE Swarm Intelligence Symposium, SIS

2005, 95–102.

Pyle, D., and Cerra, D. D. (1999). Data Preparation for Data Mining. Order A Journal

On The Theory Of Ordered Sets And Its Applications, vol. 17, 375–81.

Qiu, G., Varley, M., and Terrell, T. (1992). Accelerated Training of Backpropagation

Networks Using Adaptive Momentum Step. IEEE Electronics Letters, vol. 28(4),

377–779.

Quinlan, J. R. (1987). Simplifying Decision Trees. International Journal of Man-

Machine Studies, vol. 27(3), 221–234.

Quinlan, J. R., Compton, P. J., Horn, K. A., and Lazurus, L. (1986). Inductive

Knowledge Acquisition: A Case Study. The Second Australian Conference on

Applications of Expert Systems.

227

Rasmussen, C. E., and Williams, C. K. I. (2006). Gaussian Processes for Machine

Learning, International Journal of Neural Systems, vol. 14 (2), 69-106.

Rastrigin, L. A. (1963). Convergence of Random Search Method in Extremal Control

of Multi-Parameter Systems. Avtomatika i Telemekhanika, vol. 24, 1467–1473.

Rehman, M. Z., and Nawi, N. M. (2011a). The Effect of Adaptive Momentum in

Improving the Accuracy of Gradient Descent Back Propagation Algorithm on

Classification Problems. Communications in Computer and Information Science,

380–390.

Rehman, M. Z., and Nawi, N. M. (2011b). The Effect of Adaptive Momentum in

Improving the Accuracy of Gradient Descent Back Propagation Algorithm on

Classification Problems Software Engineering and Computer Systems, vol.

179(6), 380–90.

Rezaee, J. A. (2015). Chaotic Bat Swarm Optimisation (CBSO). Applied Soft

Computing, vol. 26, 523–530.

Robinson, J., and Rahmat-Samii, Y. (2004). Particle Swarm Optimization in

Electromagnetics. IEEE Transactions on Antennas and Propagation, vol. 52(2),

397-407.

Rodrigues, D. (2014). A Wrapper Approach for Feature Selection Based on Bat

Algorithm and Optimum-Path Forest. Expert Systems with Applications, vol.

41(5), 2250–2258.

Rosenbrock, H. H. (1960). An Automatic Method for Finding the Greatest or Least

Value of a Function. The Computer Journal, vol. 3(3), 175–184.

Rui, T., Fong S., Yang, X. S., and Deb, S. (2012). Wolf Search Algorithm with

Ephemeral Memory. In Seventh International Conference on Digital Information

Management (ICDIM 2012), 165–172.

Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning Internal

Representations by Error Propagation. Parallel Distributed Processing:

Explorations in the Microstructure of Cognition, vol. 1, 318–362.

Sarangi, P., Sahu, A., and Panda, M. (2013). A Hybrid Differential Evolution and

Back-Propagation Algorithm for Feedforward Neural Network Training.

International Journal of Computer Applications, vol. 84(14), 1–9.

228

Schroedl, S. (2008). Receiver Operating Characteristics (ROC) Explained on

Mathworks. http://www.mathworks.com/matlabcentral/fileexchange/19468-

auroc-area-under-receiver-operating-characteristic (April 16, 2016).

Schwefel, H. P. (1995). Evolution and Optimum Seeking. John Wiley & Sons.

Shao, H., and Zheng, G. (2009). A New BP Algorithm with Adaptive Momentum for

FNNs Training. Proceedings of the 2009 WRI Global Congress on Intelligent

Systems, GCIS 2009, 16–20.

Shapiro, L., and Stockman, G. (2001). Computer Vision.

http://www.amazon.com/Computer-Vision-Linda-G-Shapiro/dp/0130307963.

Shiffman, D. (2012). The Nature of Code: Simulating Natural Systems with

Processing. 1st ed. The Nature of Code.

Smith, J. W. (1988). Using the ADAP Learning Algorithm to Forecast the Onset of

Diabetes Mellitus. Proceedings of the Annual Symposium on Computer

Application in Medical Care, 261–265.

Stamey, J. (2008). Modern Mathematical Statistics with Applications. The American

Statistician, vol. 62(4), 358–358.

Step Function. (2014). Wikipedia. http://en.wikipedia.org/wiki/Step_function

(November 22, 2014).

SUN, Y., Zhang, S., Miao, C., and Li, J. M. (2007). Improved BP Neural Network for

Transformer Fault Diagnosis. Journal of China University of Mining and

Technology, vol. 17(1), 138–42.

Swanston, D. J., Bishop, J. M., and Mitchell, R. J., (1994). Simple Adaptive

Momentum: New Algorithm for Training Multilayer Perceptrons. Electronics

Letters, vol. 30(18), 1498–1500.

Tang, R., Fong, S., Yang, X. S., and Deb, S. (2012). Wolf Search Algorithm with

Ephemeral Memory. Seventh International Conference on Digital Information

Management (ICDIM 2012), 165–172.

Thimm, G., Moerland, P., and Fiesler, E. (1996). The Interchangeability of Learning

Rate and Gain in Backpropagation Neural Networks. Neural Computing, vol.

8(2), 451–60.

Übeyli, E. D. (2008). Recurrent Neural Networks Employing Lyapunov Exponents for

229

Analysis of Doppler Ultrasound Signals. Expert Systems with Applications, vol.

34, 2538–2544.

Wang, G., and Guo, L. (2013). A Novel Hybrid Bat Algorithm with Harmony Search

for Global Numerical Optimization. Journal of Applied Mathematics, vol. 2013.

Wilamowski, B. M., Cotton, N., Hewlett, J., and Kaynak, O. (2007). Neural Network

Trainer with Second Order Learning Algorithms. 11th International Conference

on Intelligent Engineering Systems, Proceedings, 127–132.

Williams, R. J., and Zipser, D. (1989). A Learning Algorithm for Continually Running

Fully Recurrent Neural Networks. Neural Computation, vol. 1(2), 270–80.

Witten, I. H., Frank, E., and Hall, M. (2011). Complementary literature None Data

Mining: Practical Machine Learning Tools and Techniques, Google E-Book.

Wolberg, W. H., and Mangasarian, O. L. (1990). Multisurface Method of Pattern

Separation for Medical Diagnosis Applied to Breast Cytology. Proceedings of the

National Academy of Sciences of the United States of America, vol. 87(23), 9193–

9196.

Xie, J., Zhou, Y., and Chen, H. (2013). A Novel Bat Algorithm Based on Differential

Operator and Levy Flights Trajectory. Computational Intelligence and

Neuroscience, vol. 2013.

Xie, J., Zhou, Y., and Zheng, H. (2013). A Hybrid Metaheuristic for Multiple Runways

Aircraft Landing Problem Based on Bat Algorithm. Journal of Applied

Mathematics, vol. 2013.

Yang, X. S. (2009a). Firefly Algorithms for Multimodal Optimization. Lecture Notes

in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence

and Lecture Notes in Bioinformatics), 169–178.

Yang, X. S. (2009b). Harmony Search as a Metaheuristic Algorithm. Studies in

Computational Intelligence, vol. 191, 1–14.

Yang, X. S. (2010a). A New Metaheuristic Bat-Inspired Algorithm. Studies in

Computational Intelligence, vol. 284, 65–74.

Yang, X. S. (2010b). Engineering Optimization: An Introduction with Metaheuristic

Applications Engineering Optimization: An Introduction with Metaheuristic

Applications. John Wiley & Sons.

230

Yang, X. S., and Deb, S. (2009). Cuckoo Search via Lévy Flights. 2009 World

Congress on Nature and Biologically Inspired Computing, NABIC 2009 -

Proceedings, 210–214.

Yang, X. S. (2008). Introduction to Mathematical Optimization Introduction to

Mathematical Optimization. Cambridge International Science Publishing,

Cambridge, UK.

Yan, X. S. (2010a). Engineering Optimization: An Introduction with Metaheuristic

Applications. Hoboken, NJ, USA: John Wiley & Sons, Inc.

Yang, X. S. (2010b). Nature-Inspired Metaheuristic Algorithms. Second Edi. Luniver

Press, Cambridge , UK.

Yang, X. S. (2011). Bat Algorithm for Multiobjective Optimization. International

Journal of Bio-Inspired Computation, vol. 3, 267–274.

Yang, X. S. (2013a). Bat Algorithm: Literature Review and Applications.

International Journal of Bio-Inspired Computation, vol. 5.

Yang X. S. (2013b). Bat Algorithm: Literature Review and Applications. International

Journal of Bio-Inspired Computation, vol. 5(3).

Yang, X. S. (2013). Cuckoo Search and Firefly Algorithm. Studies in Computational

Intelligence, vol. 516. Springer.

Yang, X. S., Deb, S., and Fong, S. (2012). Accelerated Particle Swarm Optimization

and Support Vector Machine for Business Optimization and Applications.

Networked digital technologies, vol. 12.

Yang, X. S., Deb, S., and Fong, S. (2014). Bat Algorithm Is Better Than Intermittent

Search Strategy. Multiple-Valued Logic and Soft Computing, vol. 22(3), 223–

237.

Ye, Y. C. (2001). Application and Practice of the Neural Networks. Taiwan: Scholars

Publication.

Yi, J. H., Xu, W. H., and Chen, Y. T. (2014). Novel Back Propagation Optimization

by Cuckoo Search Algorithm. The Scientific World Journal, vol. 2014.

Yilmaz, S., and Kucuksille, E. U. (2013). Improved Bat Algorithm (IBA) on

Continuous Optimization Problems. Lecture Notes on Software Engineering, vol.

1(3), 279–283.

231

Yu, C. C., Liu, B. D. (2002). A Backpropagation Algorithm with Adaptive Learning

Rate andMomentum Coefficient. Proceedings of the 2002 International Joint

Conference on Neural Networks. IJCNN’02.

Yu, H., and Wilamowski, B. M. (2012). Neural Network Training with Second Order

Algorithms. Human–Computer Systems Interaction: Backgrounds and

Applications, vol. 2, 463–76.

Yu, Hao, and Bogdan, M. (2010). Levenberg–Marquardt Training. The Industrial

Electronics Handbook 5.

Yu, X. H. (1993). Acceleration of Backpropagation Learning Using Optimised

Learning Rate and Momentum. Electronics Letters, vol. 29, 1288.

Zhang, J., Lok, T., and Lyu, M. R. (2007). A Hybrid Particle Swarm Optimization-

Back-Propagation Algorithm for Feedforward Neural Network Training. Applied

Mathematics and Computation, vol. 185, 1026–1037.

Zhang, Lei, and Jiexin P. (2011). An Improved Back Propagation Neural Network in

Objects Recognition. IEEE International Conference on Automation and

Logistics, ICAL, 507–511.

Zheng, H., and Yongquan, Z. (2012). A Novel Cuckoo Search Optimization Algorithm

Based on Gauss Distribution. Journal of Computational Information Systems,

vol. 8(12), 4193–4200.

Documents

AN IMPROVED BAT ALGORITHM WITH ARTIFICIAL NEURAL …eprints.uthm.edu.my/id/eprint/9127/1/Syed_Muhammad... · complex searching problems with ease. Nowadays, nature inspired swarm