View
3
Download
0
Category
Preview:
Citation preview
Artificial Neural Networks Applied To Plasma Spray Manufacturing
Thesis
Submitted in Fulfilment of the Requirements for the Degree of
Doctor of Philosophy
By
Tanveer Ahmed Choudhury
Faculty of Engineering and Industrial Sciences (FEIS)
Swinburne University of Technology
Hawthorn, Victoria – 3122
Australia
2013
Tanveer Ahmed Choudhury Page i
Declaration
The author hereby declares that this thesis, submitted in fulfilment of the requirements
for the Degree of Doctor of Philosophy, contains no material which has been accepted
for the award of any other degree or diploma, except where due reference is made in
the text. To the best of the author’s knowledge, this thesis, contains no material
previously published or written by another person except where due reference is made
in the text. In places, where the work is based on joint research or publications, this
thesis discloses the relative contribution of the respective workers or authors.
Tanveer Ahmed Choudhury
October, 2013
Tanveer Ahmed Choudhury Page ii
Abstract
Thermal spray is a general term for a group of coating processes that are used
to apply metal or non-metallic coatings to protect a functional surface or to improve its
performance. There are some 40 processing parameters that define the overall coating
quality and these must be selected in an optimized fashion to manufacture a coating
that exhibits desirable properties. The proper combination of processing variables is
critical since these influence the cost as well as the coating characteristics. The
atmospheric plasma spray is a thermal spray process that combines the highest
number of such processing parameters. Because of the high number, a major
challenge is to have full control over the system and to understand parameter
interdependencies, correlations and their individual effects on the in-flight particle
characteristics, which have significant influence on the in-service coating properties. A
robust methodology is, thus, required to study these interrelated effects.
An approach, based on artificial neural network method is proposed in this
study to model the atmospheric plasma spray process in predicting the in-flight particle
characteristics from the input processing power and injection parameters. The
predicted values represent existing correlations with the input processing parameters
and do not depend on procedures emanating from the mathematical fitting procedures.
It, thus, helps in understanding the parameter relationships better for setting up an on-
line thermal spray control system, along with a diagnostic tool, to allow the automated
system achieve the desired process stability. The study illustrates the model’s design,
network optimization procedures, the database handling and expansion steps and
analysis of the predicted values, with respect to the experimental ones, in order to
evaluate the model’s performance.
A function-approximating artificial neural network is implemented in this study;
where the network is trained to model complex input-output relationships for
generalizing and predicting outputs from unseen inputs. One of the major problems for
such function-approximating neural network is over-fitting, which reduces the
generalization capability of a trained network and its ability to work with sufficient
accuracy under a new environment. Two methods are used to analyse the
improvement in the network’s generalization ability: (i) cross-validation and early
stopping, and (ii) Bayesian regularization. Simulations are performed both on the
original and expanded database with different training conditions to obtain the
variations in performance of the trained networks under various environments. The
Tanveer Ahmed Choudhury Page iii
predicted in-flight particle characteristics are analysed to evaluate the network
performance and generalization ability. In comparison to the use of cross-validation
and early stopping during network training, the simulation results show an improvement
in the generalization performance of the networks with the implementation of a
regularization technique; thus preventing any phenomenon associated with over-fitting.
The default multi-layer feed forward network structure, previously used to model
the atmospheric plasma spray process, presents a major technical challenge of
optimizing the number of hidden layer neurons and smoothing the error training curve.
In order to overcome the associated difficulties, a modified version of the network
structure, to model the atmospheric plasma spray process, is proposed. The default
multi-layer feed forward network structure is used, where the matrix defining the
connections from the input layer to the hidden layers is altered to obtain a robust
trained network capable of handling the versatility and non-linearity associated with the
plasma spray process. The resulting network demonstrates higher and more stable
correlation coefficient values across various combinations of the number of neurons in
the hidden layers. The corresponding generalization error values are also found to be
stable and lower. The network parameter fluctuations are found to decrease. The
training performances are smoother with fewer fluctuations along with the decrease in
the training time in reaching a lower error value.
Modular implementation of an artificial neural network is presented later on in
this study to model the atmospheric plasma spray process in predicting the in-flight
particle characteristics from the input processing parameters. The modular
implementation allows simplification of the optimized model structure with enhanced
ability to generalize the network. As well, the underlying relationship between each of
the output in-flight particle characteristics with respect to the input processing
parameters is explored. Smaller networks are constructed that achieves better, or in
some cases, similar results. The training process is found to be more robust and stable
along with fewer fluctuations in the values of the network parameters. The networks
also respond to the variations of the number of hidden layer neurons with some definite
trend. The predictable trend enhances reliability of the application of the artificial neural
network in modelling the atmospheric plasma spray process and overcome the
variability and non-linearity associated with the process.
A robust single hidden layer feed forward neural network (SLFN) is further used
in this study to model the in-flight particle characteristics of the plasma spray process
Tanveer Ahmed Choudhury Page iv
with regard to the input processing parameters. The training times of traditional back
propagation algorithms, mostly used to model such processes, are far slower than
desired for implementation of an on-line control system. Use of slow gradient based
learning methods and iterative tuning of all network parameters during the learning
process are the two major causes for the slower learning speed. An extreme learning
machine algorithm, which randomly selects the input weights and biases and
analytically determines the output weights, is used in this work to train the SLFNs in
modelling the plasma spray process. In comparison to the performance of the networks
trained with error back-propagation algorithm, the networks trained with the extreme
learning machine algorithm have better generalization performance, much shorter
training times and stable performance with regard to the number of hidden layer
neurons. The trends represent robustness of the trained networks and enhance
reliability of the application of the artificial neural network in modelling the plasma spray
process.
In a real life spraying scenario, the plasma spray input processing parameters
vary, within limits, during the spraying process. These variations affect the output in-
flight particle characteristics. Sensitivity of the trained network’s output to the variations
of the input processing parameters is computed. A uniform noise generator is used to
simulate such variations of the input processing parameters. Both multi-layer and
single layer feed forward network structures are tested with various back propagation
algorithms and the extreme learning machine algorithm. Such analysis provides a
thorough understanding of the trained neural networks’ response to the input
parameter fluctuations. It, thus, presents a better understanding of the modelled
network in terms of robustness and makes it suitable to be incorporated to an on-line
thermal spray control system along with a suitable diagnostic tool.
The different artificial neural network models, proposed and used in the course
of the work, were trained and optimized using a database from the literature. The
networks were able to learn the input / output parameter relationships and correlate in-
flight particle characteristics with each of the input processing parameters. It is,
however, important to validate that the applicability of the developed models are not
limited to a single case. The network models can be re-trained and optimized to be
used in a range of different cases and environments. An experiment is, thus, carried
out in relation to the atmospheric plasma spray process. The obtained experimental
database is used to train selected artificial neural network structures and models. A
Tanveer Ahmed Choudhury Page v
good generalization performance of the developed networks is obtained. This validates
the proposed artificial neural network models because the resultant networks are found
to work with both the experimental data and a database from the literature.
Tanveer Ahmed Choudhury Page vi
Acknowledgments
I would like thank and express my deepest gratitude to Almighty Allah, the most
merciful and most benevolent, for giving me the patience and helping me through to the
completion of my study at Swinburne University of Technology.
Foremost, I owe my deepest gratitude and appreciation to my principle
coordinating supervisor Prof. Christopher. C. Berndt for his continuous support and
guidance, which made my journey enjoyable. I consider it an honour to work with Prof.
Berndt and my path to completion of this thesis would have been difficult without his
support. I would specially like to thank him for guiding me through the hard times. I am
thankful to Prof. Berndt for his patience in going through my thesis and various
manuscripts.
It gives me great pleasure to acknowledge Dr. Nasser Hosseinzadeh, who was
my principle coordinating supervisor for the first half of my PhD candidature, before he
left Swinburne. Dr. Nasser along with Prof. Berndt allowed me, on the first instance, to
embark on this exciting journey of research. I am grateful to Dr. Nasser for helping me
initially to settle down in my PhD studies and guiding me through at various times.
I would like to mention here the name Prof. Zhihong Man and thank him for his
sincere help and contribution in this work, especially in the field of neural networks and
machine learning. I am indebted to Prof. Man for all his brilliant suggestions and
advice. He has always supported me whenever I needed any help. I would also like to
thank my coordinating supervisor Dr. Yat Choy Wong for accepting to be in my
supervisory panel during the end of my PhD period.
I am grateful to Swinburne University of Technology for providing me with the
Swinburne University post graduate research award (SUPRA) to facilitate the research
and support me financially during my PhD.
The contribution of the thermal spray group should be acknowledged. Deep
appreciation goes to Dr. Andrew Ang, from the thermal spray group. He has been
extremely kind to help me acquire the desired experimental data for my thesis. I thank
him for his advice and help during my thesis writing time. I would like to thank United
Surface Technologies Pty. Ltd., Australia, for providing the opportunity to carry out the
required experimental work. I am grateful to all my Swinburne colleagues and friends
for being immensely helpful and supportive at all times.
Tanveer Ahmed Choudhury Page vii
I would like to thank my parents and younger sister for their constant motivation
and encouragement. They have always supported me in all the right things and without
them I would not be able to be in this position.
Special thanks and deep appreciation goes to my wife, Saima Sharmin Dana. I
am grateful and indebted to her for the constant encouragement she has given me
during both good and bad times. She has always been supportive of my thoughts and
ideas. I would like to appreciate her patience in understanding and tolerating me
throughout the study period.
Table of Contents
Tanveer Ahmed Choudhury Page viii
Table of Contents
Declaration ................................................................................................................... i Abstract ....................................................................................................................... ii Acknowledgments ..................................................................................................... vi Table of Contents .................................................................................................... viii List of Figures ............................................................................................................ xi List of Tables ......................................................................................................... xviii List of Notations ...................................................................................................... xxi List of Acronyms .................................................................................................... xxii
Chapter 1 Introduction ........................................................................................... 1
1.1 Background ............................................................................................. 1
1.2 Literature search ...................................................................................... 2
1.3 Research objective .................................................................................. 4
1.4 Thesis structure and overview ................................................................. 6
Chapter 2 Background Study ............................................................................... 11
2.1 Atmospheric plasma spray .................................................................... 11
2.2 Artificial neural network .......................................................................... 15
2.2.1 Network structure ............................................................................... 21
2.2.1.1 Artificial neuron model .................................................................. 21
2.2.1.2 Multi-layer feed-forward neural network structure ......................... 24
2.2.2 Network learning ................................................................................ 26
2.2.2.1 Back propagation algorithm .......................................................... 27
2.2.2.2 Levenberg-Marquardt algorithm ................................................... 37
2.2.2.3 Bayesian regularization algorithm ................................................ 39
2.2.2.4 Resilient back propagation algorithm............................................ 41
2.3 Multi-Net system .................................................................................... 41
2.3.1 Ensemble combination ....................................................................... 43
2.3.1.1 Creating ensembles ..................................................................... 43
2.3.1.2 Combining Ensemble Nets ........................................................... 45
2.3.2 Modular combination .......................................................................... 46
2.3.2.1 Creating modular components ..................................................... 46
2.3.2.2 Combining modular components .................................................. 47
Table of Contents
Tanveer Ahmed Choudhury Page ix
Chapter 3 Artificial Neural Network Modelling .................................................... 51
3.1 Background ........................................................................................... 51
3.2 Data collection and pre-processing ........................................................ 53
3.3 Database expansion .............................................................................. 56
3.4 Network architecture .............................................................................. 59
3.5 Network training and optimization .......................................................... 61
3.6 Simulation result analysis and discussion .............................................. 74
3.7 Summary ............................................................................................... 87
Chapter 4 Network Structure Modification and Multi-Net System ..................... 90
4.1 Network Structure Modification .............................................................. 90
4.1.1 Background ........................................................................................ 90
4.1.2 Proposed network architecture ........................................................... 91
4.1.3 Database handling ............................................................................. 92
4.1.4 Network training and optimization ...................................................... 93
4.1.5 Simulation result analysis and discussion .......................................... 95
4.1.5.1 Results for new structure.............................................................. 95
4.1.5.2 Results obtained for additional networks ...................................... 97
4.1.5.3 Comparison of results and discussion ........................................ 102
4.1.6 Summary ......................................................................................... 112
4.2 Multi-Net System and Modular Combination ........................................ 113
4.2.1 Background ...................................................................................... 113
4.2.2 Modular Combination ....................................................................... 116
4.2.3 Database processing ....................................................................... 118
4.2.4 Network training and optimization .................................................... 120
4.2.5 Construction of additional networks .................................................. 121
4.2.6 Simulation result analysis, comparison and discussion .................... 122
4.2.6.1 Results for modular neural networks .......................................... 122
4.2.6.2 Results obtained for additional networks .................................... 127
4.2.6.3 Result comparison and analysis ................................................. 131
4.2.7 Summary ......................................................................................... 142
Chapter 5 Extreme Learning Machine and Sensitivity Analysis ...................... 145
5.1 Extreme learning machine ................................................................... 145
5.1.1 Background ...................................................................................... 145
5.1.2 Artificial neural network modelling .................................................... 148
Table of Contents
Tanveer Ahmed Choudhury Page x
5.1.2.1 Outline of the extreme learning machine algorithm..................... 149
5.1.2.2 Network training conditions ........................................................ 153
5.1.2.3 Construction of additional networks ............................................ 153
5.1.3 Simulation results and performance comparisons ............................ 154
5.1.3.1 Extreme learning machine algorithm performance ..................... 154
5.1.3.2 Standard artificial neural networks performance ......................... 156
5.1.3.3 Network performance comparisons ............................................ 162
5.1.4 Result analysis and discussion ........................................................ 167
5.1.5 Summary ......................................................................................... 179
5.2 Sensitivity analysis of neural networks ................................................. 179
5.2.1 Background ...................................................................................... 179
5.2.2 Database processing and noise addition .......................................... 181
5.2.3 Artificial neural network models ........................................................ 183
5.2.4 Simulation result analysis and discussion ........................................ 185
5.2.5 Summary ......................................................................................... 194
Chapter 6 Experimental Work and Network Modelling..................................... 197
6.1 Experiment design and plasma spray process set-up .......................... 198
6.2 Artificial neural network modelling........................................................ 201
6.3 Network training and optimization ........................................................ 208
6.4 Simulation result .................................................................................. 211
6.4.1 Proposed network models ................................................................ 211
6.4.2 Performance comparison and result analysis ................................... 218
6.5 Summary ............................................................................................. 230
Chapter 7 Conclusion and Future Work ............................................................ 233
7.1 Conclusion ........................................................................................... 233
7.2 Future work ......................................................................................... 237
References: ............................................................................................................. 240
Appendix A: List of Publications ........................................................................... 259
Appendix B: Expanded Database, DSE .................................................................. 260
List of Figures
Tanveer Ahmed Choudhury Page xi
List of Figures
Figure 1-1: A mind map of the research thoughts in this thesis. ................................. 7
Figure 1-2: Flowchart outlining the research work carried out in this thesis. .............. 8
Figure 2-1: Schematic of an atmospheric plasma spray process [50]. ..................... 11
Figure 2-2: Thermal spray coating parameters involved in splat formation [53]. ....... 13
Figure 2-3: Demonstration of over-fitting for a function approximating artificial
neural network. ...................................................................................... 19
Figure 2-4: A Non-linear model of an artificial neuron k . ........................................ 22
Figure 2-5: Fully connected multi-layer feed-forward artificial neural network
architecture with two hidden layers. ....................................................... 24
Figure 2-6: Block diagram of the supervised learning process. ................................ 26
Figure 2-7: Block diagram of the unsupervised learning process. ............................ 27
Figure 2-8: Signal flow graph of the output layer neuron j. ....................................... 28
Figure 2-9: Signal flow graph of the hidden layer neuron j connected to the
output layer neuron k. ............................................................................ 33
Figure 2-10: Classifications of a multi-net artificial neural network system. ................ 42
Figure 2-11: Four different modes of combining artificial neural network
modular components (a) cooperative combination, (b) sequential
combination, (c) competitive combination, and (d) supervisory
combination. .......................................................................................... 48
Figure 3-1: Research methodology for artificial neural network modelling of
the atmospheric plasma spray process. ................................................ 52
Figure 3-2: Block diagram of the designed multi-layer artificial neural network. ....... 59
Figure 3-3: Network performances with different algorithms and number of
hidden layers. ........................................................................................ 63
Figure 3-4: Difference in standard deviations of the training and validation
sets for DSOTR. ....................................................................................... 65
Figure 3-5: Difference in standard deviations of the training and validation
sets for DSETR. ....................................................................................... 66
Figure 3-6: Correlation coefficient (R) variations with various artificial neural
network structures on the test set. ......................................................... 68
List of Figures
Tanveer Ahmed Choudhury Page xii
Figure 3-7: Correlation coefficient (R) variations with various artificial neural
network structures on the test set. ......................................................... 70
Figure 3-8: Generalization error variations with various artificial neural
network structures on the test set. ......................................................... 71
Figure 3-9: Network performance on test sets for various artificial neural
network structures trained with Bayesian Regularization algorithm........ 72
Figure 3-10: Number of network parameter variations with various artificial
neural network structures. ..................................................................... 73
Figure 3-11: Standard deviations of the network parameters for different neural
network structures trained with both Levenberg-Marquardt and
Bayesian Regularization algorithms. ...................................................... 74
Figure 3-12: Variations of in-flight particle characteristics with the changes in
current intensity. .................................................................................... 79
Figure 3-13: Variations of in-flight particle characteristics with the changes in
hydrogen plasma gas flow rate. ............................................................. 80
Figure 3-14: Variations of in-flight particle characteristics with the changes in
total plasma gas flow rate. ..................................................................... 82
Figure 3-15: Variations of in-flight particle characteristics with the changes in
carrier gas flow rate. .............................................................................. 83
Figure 3-16: Variations of in-flight particle characteristics with the changes in
injector stand-off distance. ..................................................................... 85
Figure 3-17: Variations of in-flight particle characteristics with the changes in
injector diameter. ................................................................................... 86
Figure 4-1: Block diagram of the default multi-layer artificial neural network
structure ‘100’. ....................................................................................... 91
Figure 4-2: Proposed modified artificial neural network structure ‘111’ with
additional connection from the input layer to hidden layer 2 and
the output layer. ..................................................................................... 92
Figure 4-3: Generalization performances of the artificial neural networks with
proposed structure ‘111’ and various combinations of the hidden
layer neurons. ....................................................................................... 96
Figure 4-4: Generalization performance of networks ‘100-LM’ with various
combinations of the hidden layer neurons. ............................................ 99
List of Figures
Tanveer Ahmed Choudhury Page xiii
Figure 4-5: Generalization performance of networks ‘100-BR’ with various
combinations of the hidden layer neurons. .......................................... 100
Figure 4-6: Generalization performance of networks ‘100-RP’ with various
combinations of the hidden layer neurons. .......................................... 101
Figure 4-7: Average generalization performance for four different artificial
neural networks. .................................................................................. 103
Figure 4-8: Standard deviations of the generalization performances of four
different artificial neural networks. ....................................................... 104
Figure 4-9: Maximum correlation coefficient (R) values of four different
artificial neural networks along with their corresponding total
number of hidden layer neurons. ......................................................... 105
Figure 4-10: Average standard deviations of the network parameters for four
different artificial neural networks. ....................................................... 107
Figure 4-11: Generalization performance of the four different artificial neural
networks with 8 and 7 neurons in the 1st and 2nd hidden layers............ 108
Figure 4-12: Training error responses (for the first 30 epochs (iterations)) of the
four different artificial neural networks. ................................................ 110
Figure 4-13: Research methodology for modular implementation of artificial
neural network in modelling the atmospheric plasma spray
process................................................................................................ 115
Figure 4-14: An updated co-operative combination of artificial neural network
modular components. .......................................................................... 116
Figure 4-15: Flowchart for modular artificial neural network implementation of
the atmospheric plasma spray process. .............................................. 117
Figure 4-16: Single hidden layer multi-layer artificial neural network
architecture. ........................................................................................ 118
Figure 4-17: Data split process for modular implementation of artificial neural
networks in modelling the atmospherics plasma spray process. .......... 119
Figure 4-18: Generalization performance of NET1 over various number of
hidden layer neurons. .......................................................................... 123
Figure 4-19: Generalization performance of NET2 over various number of
hidden layer neurons. .......................................................................... 125
List of Figures
Tanveer Ahmed Choudhury Page xiv
Figure 4-20: Generalization performance of NET3 over various number of
hidden layer neurons. .......................................................................... 126
Figure 4-21: Generalization performance of COMP1 over various combinations
of the hidden layer neurons. ................................................................ 128
Figure 4-22: Generalization performance of COMP2 over various combinations
of the hidden layer neurons. ................................................................ 129
Figure 4-23: Generalization performance of COMP3 over various combinations
of the hidden layer neurons. ................................................................ 130
Figure 4-24: Performance comparison of modular networks with general
artificial neural networks in predicting the individual in-flight
particle characteristics. ........................................................................ 134
Figure 4-25: Correlation coefficient (R) and total number of hidden layer
neurons comparison of the combined modular network output
model, NET-C, with general artificial neural network............................ 136
Figure 5-1: Proposed single layer feed forward network (SLFN) artificial
neural network architecture. ................................................................ 148
Figure 5-2: Generalization performance variations of the networks trained with
the extreme learning machine algorithm with respect to the
number of hidden layer neurons. ......................................................... 155
Figure 5-3: Variations of training times of the networks trained with the
extreme learning machine algorithm with respect to the number of
hidden layer neurons. .......................................................................... 156
Figure 5-4: Generalization performance and training times of the networks
trained with the Levenberg-Marquardt (LM) algorithm with respect
to the number of hidden layer neurons. ............................................... 157
Figure 5-5: Generalization performance and training times of the networks
trained with resilient back-propagation (RP) algorithm with respect
to the number of hidden layer neurons. ............................................... 159
Figure 5-6: Generalization performance and training times of the networks
trained with Bayesian regularization (BR) algorithm with respect to
the number of hidden layer neurons. ................................................... 161
List of Figures
Tanveer Ahmed Choudhury Page xv
Figure 5-7: Average generalization performance comparison of the extreme
learning machine algorithm with standard back-propagation
algorithms. ........................................................................................... 164
Figure 5-8: Generalization performance comparisons of the selected networks
trained with extreme learning machine and standard back-
propagation algorithm. ......................................................................... 166
Figure 5-9: Variations of in-flight particle characteristics with the changes in
current intensity. .................................................................................. 171
Figure 5-10: Variations of in-flight particle characteristics with the changes in
hydrogen plasma gas flow rate. ........................................................... 173
Figure 5-11: Variations of in-flight particle characteristics with the changes in
total plasma gas flow rate. ................................................................... 174
Figure 5-12: Variations of in-flight particle characteristics with the changes in
carrier gas flow rate. ............................................................................ 176
Figure 5-13: Variations of in-flight particle characteristics with the changes in
injector stand-off distance. ................................................................... 177
Figure 5-14: Variations of in-flight particle characteristics with the changes in
injector diameter. ................................................................................. 178
Figure 5-15: Flowchart of the sensitivity analysis of designed artificial neural
network models to the fluctuations of the atmospheric plasma
spray input processing parameters. ..................................................... 181
Figure 5-16: Variations of correlation coefficient (R) values of the selected
network NN1 output in-flight particle characteristics with the
gradual addition of noise to the atmospheric plasma spray
specified input processing parameters. ............................................... 186
Figure 5-17: Variations of correlation coefficient (R) values of the selected
network NN2 output in-flight particle characteristics with the
gradual addition of noise to the atmospheric plasma spray
specified input processing parameters. ............................................... 187
Figure 5-18: Variations of correlation coefficient (R) values of the selected
network 111-M output in-flight particle characteristics with the
gradual addition of noise to the atmospheric plasma spray
specified input processing parameters. ............................................... 188
List of Figures
Tanveer Ahmed Choudhury Page xvi
Figure 5-19: Variations of correlation coefficient (R) values of the selected
network NET-C output in-flight particle characteristics with the
gradual addition of noise to the atmospheric plasma spray
specified input processing parameters. ............................................... 189
Figure 5-20: Variations of correlation coefficient (R) values of the selected
network ELM-1 output in-flight particle characteristics with the
gradual addition of noise to the atmospheric plasma spray
specified input processing parameters. ............................................... 190
Figure 5-21: Combined graph to represent variations of correlation coefficient
(R) values of all the selected networks output in-flight particle
characteristics with the gradual addition of noise to the
atmospheric plasma spray specified input processing parameters. ..... 191
Figure 5-22: Drop ratios for selected artificial neural networks. ................................ 193
Figure 6-1: Research methodology for artificial neural network modelling of an
atmospheric plasma spray process with experimental dataset. ........... 197
Figure 6-2: Block diagram of the designed multi-layer artificial neural network
(ANN) structure. .................................................................................. 202
Figure 6-3: Flowchart for modular artificial neural network implementation of
the atmospheric plasma spray process. .............................................. 205
Figure 6-4: Single layer multi-layer perceptron (MLP) artificial neural network
(ANN) architecture. .............................................................................. 206
Figure 6-5: Flowchart representing the data split process for training of
developed modular artificial neural network models. ........................... 207
Figure 6-6: Research methodology for artificial neural network implementation
of the atmospheric plasma spray process to predict the output
average of in-flight particle characteristics using different artificial
neural network models and structures. ................................................ 208
Figure 6-7: Data division process of the experimental database of the
atmospheric plasma spray process for training and testing of the
different designed artificial neural network models. ............................. 210
Figure 6-8: Generalization performances of all the artificial neural networks
N1 with different combination of the number of hidden layer
neurons. .............................................................................................. 213
List of Figures
Tanveer Ahmed Choudhury Page xvii
Figure 6-9: Generalization performances of all the artificial neural networks
N2 with different combination of the number of hidden layer
neurons. .............................................................................................. 214
Figure 6-10: Generalization performances of the modular artificial neural
network N3-V with different combination of the number of hidden
layer neurons. ..................................................................................... 215
Figure 6-11: Generalization performances of the modular artificial neural
network N3-T with different combination of the number of hidden
layer neurons. ..................................................................................... 216
Figure 6-12: Generalization performances of the modular artificial neural
network N3-D with different combination of the number of hidden
layer neurons. ..................................................................................... 217
Figure 6-13: Average generalization performance comparison of different
artificial neural network models. .......................................................... 219
Figure 6-14: Generalization performance comparison of the various selected
best performing artificial neural network models. ................................. 220
Figure 6-15: Generalization performance of the selected artificial neural
network models on the entire experimental database EDSO. ............... 225
Figure 6-16: Absolute average relative percentage errors of different selected
artificial neural network models in predicting the in-flight particle
characteristics of an atmospheric plasma spray process from the
input processing parameters. .............................................................. 230
List of Tables
Tanveer Ahmed Choudhury Page xviii
List of Tables
Table 3-1: Experimental database (DSO) from literature consisting of the
atmospheric plasma spray input processing parameters and the
output in-flight particle characteristics [40]. ............................................ 54
Table 3-2: Physical limits of the atmospheric plasma spray input processing
parameters and the output in-flight particle characteristics along
with the input parameters reference values [40]. ................................... 56
Table 3-3: Data point values to represent classifications of the following input
processing parameters. ......................................................................... 61
Table 3-4: Generalization errors generated by the networks trained by
Levenberg-Marquardt algorithm with datasets DSOTR and DSETR. .......... 67
Table 3-5: Experimental and predicted in-flight particle characteristics values
for the selected networks NN1 and NN2 along with the absolute
relative error percentage. ...................................................................... 76
Table 3-6: Absolute average relative error percentage of the predicted in-
flight particle characteristics with the variations of each input
processing parameters. ......................................................................... 77
Table 4-1: Number of network parameters used during training of different
artificial neural networks. ..................................................................... 106
Table 4-2: Number of epochs required to minimize the artificial neural
network training error........................................................................... 109
Table 4-3: Performance comparison summary of the proposed structure ‘111’
with the default artificial neural network structure ‘100’. Note:
“MAE” refers to mean absolute error and for each performance
parameter, the best performing values are typed in bold. .................... 112
Table 4-4: Standard deviations of correlation coefficient (R) for the modular
and general artificial neural networks. ................................................. 133
Table 4-5: Network parameter statistics for different networks. ............................ 137
Table 4-6: Correlation coefficient (R) value comparisons of the selected
networks. ............................................................................................. 139
Table 4-7: The predicted values and absolute relative error percentages for
both modular and the general artificial neural networks. ...................... 140
List of Tables
Tanveer Ahmed Choudhury Page xix
Table 4-8: Absolute average relative error percentage of the predicted
average in-flight particle characteristics with the variations of each
input processing parameters. .............................................................. 141
Table 5-1: Summary of the training performances of extreme learning
machine (ELM) and back propagation (BP) algorithms in training
the artificial neural networks with variations of hidden layer
neurons from 1 to 300. ........................................................................ 163
Table 5-2: Summary of the generalization performances of different selected
artificial neural networks ...................................................................... 166
Table 5-3: Input processing parameters along with the corresponding
experimental and predicted in-flight particle characteristics values.
The individual and average absolute relative error percentage is
also mentioned. Note: the variations of each of the input
processing parameters are highlighted in bold. The other
parameter values were hold constant to their reference values. .......... 169
Table 5-4: Upper and lower limits of the uniform distributed noise values
generated for each of the input atmospheric plasma spray input
processing parameters. ....................................................................... 182
Table 5-5: Performance values for the sensitivity analysis of the different
selected networks with the fluctuations of the neural network input
parameters. ......................................................................................... 194
Table 6-1: Experimental database (EDSO) consisting of the atmospheric
plasma spray input processing parameters and the output in-flight
particle characteristics. ........................................................................ 200
Table 6-2: Atmospheric plasma spray process experiment parameters. The
standard deviations of the measured in-flight particle
characteristics are indicated. ............................................................... 201
Table 6-3: The experimental in-flight particle characteristics values from the
experimental database EDSO with the corresponding predicted
values from the developed artificial neural network models. ................ 222
Table 6-4: Standard deviations of the experimental in-flight particle
characteristics of an atmospheric plasma spray process along
with prediction error by the selected artificial neural network N1-M. ..... 226
List of Tables
Tanveer Ahmed Choudhury Page xx
Table 6-5: Standard deviations of the experimental in-flight particle
characteristics of an atmospheric plasma spray process along
with prediction error by the selected artificial neural network N2-M. ..... 227
Table 6-6: Standard deviations of the experimental in-flight particle
characteristics of an atmospheric plasma spray process along
with prediction error by the selected artificial neural network N3-C. ..... 228
Table 6-7: Absolute average relative error percentage of the predicted in-
flight particle characteristics by different artificial neural network
models with the variations of atmospheric plasma spray input
processing parameters. ....................................................................... 229
List of Notations
Tanveer Ahmed Choudhury Page xxi
List of Notations
Identification Number Symbol Unit Description
1 I A Arc current intensity
2 ArV SLPM Argon primary plasma gas flow rate
3 2HV SLPM Hydrogen primary plasma gas flow rate
4 CGV SLPM Carrier gas flow rate
5 ID mm Injector diameter
6 injD mm Injector stand-off distance
7 V m/s Average in-flight particle velocity
8 T °C Average in-flight particle temperature
9 D μm Average in-flight particle diameter
10 , ,i j k - Indices referring to different neurons
11 n - Iteration / training pattern
12 ( )E n - Instantaneous sum of error squares at iteration n
13 avE - Average of the instantaneous sum of error squares at iteration n
14 ( )je n - Error signal at the output of neuron j at iteration n
15 ( )jt n - Target response of neuron j
16 ( )jy n - Output of neuron j at iteration n
17 ( )jiw n - Synaptic weight connecting the output of neuron i to the input of neuron j at iteration
18 ( )jiw n∆ - Correction applied to the synaptic weight connecting the output of neuron i to the input of neuron j at iteration n
19 ( )jv n - Net internal activity level of neuron j at iteration n
20 ( )jϕ - Activation function associated to neuron j
21 jb - Bias value applied to neuron j
22 ( )ix n - ith element of input vector
23 ( )ko n - kth element of the output vector
24 η - Learning rate parameter
List of Notations
Tanveer Ahmed Choudhury Page xxii
List of Acronyms
Identification Number Acronym Description
1 ANN Artificial neural network
2 APS Atmospheric plasma spray
3 BP Back propagation algorithm
4 BR Bayesian regularization algorithm
5 ELM Extreme learning machine algorithm
6 LM Levenberg-Marquardt algorithm
7 MLP Multi-layer perceptron
8 RP Resilient back propagation algorithm
9 SLFN Single hidden layer feed forward neural network
Chapter 1: Introduction
Tanveer Ahmed Choudhury Page 1
Chapter 1 Introduction
This chapter presents the research background, motivation and objectives
based on the literature search. The chapter ends with a brief overview of the thesis
structure.
1.1 Background
Atmospheric plasma spray (APS) is a thermal spray process used for the
application of metal or non-metallic coatings on a variety of candidate materials; e.g.,
metals, ceramics, composites and polymers [1-3]. This helps in protecting a functional
surface or to improve its performance by solving numerous problems of wear, corrosion
and thermal degradation. A list of some common coating applications includes
corrosion prevention [4, 5]; wear and oxidation resistance; dimensional restoration and
repair; thermal control and insulation; abrasive activity; biomedical compatibility;
electromagnetic shielding and many more. A greater degree of particle melting and
relatively high particle velocity of the plasma spray results in higher deposition density
and bond strengths compared to most electric and arc spray coatings [6].
An important parameter in defining the performance and durability of a coating
is its bond strength with the underlying substrate. Plasma spray commercial coating
and proprietary nanostructured coating bond strengths typically are 35 MPa and
80 MPa, respectively [7]. A high droplet/substrate adhesion is achieved from the high
particle velocity and deformation that occur on impact. The inert gas plasma jet
generates lower oxide content than other thermal spray processes. APS has, thus,
become popular in industrial applications.
The plasma spray operating conditions [8, 9] and coating properties, such as
size and distribution of porosity, oxide content, residual stress, macro and micro
cracks, are strongly affected by the in-flight particle characteristics; for example, in-
flight particle velocity, surface temperature and diameter [8, 10, 11]. A recent study by
Cizek et al. [12] illustrates the influence of in-flight particle characteristics on a specific
example of plasma sprayed hydroxyapatite coating characteristics.
The in-flight characteristics are strongly influenced by the input spray
parameters [12], which are closed loop controlled and set to nominally constant values.
However, these parameters vary during the APS process and calibration and
Chapter 1: Introduction
Tanveer Ahmed Choudhury Page 2
adjustments of the variable levels are necessary. The particle variations influence the
in-flight particle characteristics Although it is the particle surface temperature that is
actually measured at all times, for simplicity in this work and others [13, 14], it is
referred to as ‘particle temperature’; i.e., it is implied that the surface temperature is
being measured.
The variations in in-flight particle characteristics are considered to be indicators
of process control [15]. Due to the involvement of a large number of input processing
parameters in APS, it is difficult to set up the process control. There is an associated
cost to optimize the thermal spray parameters for new coating materials. Therefore,
there is a need to reduce the variables to manageable numbers. The in-flight particle
characteristics are sensitive to the input processing parameters [16, 17], especially to
the following power and injection parameters: arc current intensity, argon gas flow rate,
hydrogen flow rate, argon carrier gas flow rate, injector stand-off distance and the
injector diameter. Accurate control and appropriate combination of the spray
parameters are important since these influence the performance and durability of the
coatings [2, 3]. Control of the parameters will, at the same time, assist engineers in
reducing the time and complexities related to the spray tuning and parameter setting.
The in-flight particle optical sensors are used for real time monitoring of the
coating manufacturing process [18, 19]. These sensors are, however, unable to tune
the parameters to the proper and optimum operating values when the jet reveals any
fluctuations, which makes the process control incomplete. It would be desirable to have
a feedback system coupled to the sensor that can predict the in-flight particle
characteristics, involving the average particle velocity, temperature and diameter, with
respect to the variations of each input processing parameter. The input parameters
could, thus, be adjusted beforehand to achieve the desired particle characteristics.
However, this task becomes difficult due to the non-linearity and many permutations of
the thermal spray process [20].
1.2 Literature search
The initial idea for the neural network implementation of the thermal spray
process was presented by Einerson et al. [21]. The studies [22, 23] described the
relative simplicity of the neural networks required to model the spray process.
Chapter 1: Introduction
Tanveer Ahmed Choudhury Page 3
In the past literature, artificial neural networks (ANNs) have been used in
modelling APS from various perspectives. In a follow through to the initial study [24],
Fauchais et al. [25] provides a review on the monitoring and control of the plasma
spray process, including on-line control of the spray process using the ANN technique.
Kanta et al. in [26] used ANN to model the APS process for predicting the processing
parameters from the coating structural attribute; that is, the deposition yield of grey
alumina (Al2O3-TiO2 – 13% by wt.) coatings.
Guessasma et al. used ANN to model the APS process in correlating the
process parameters with coating properties [27] and further predict the porosity level
[28], microstructure features [29] and adhesion properties [30] of similar APS alumina-
titania coatings. The authors in [31] also used an ANN methodology to derive
correlations between selected processing parameters and heat flux transmitted to a
workspace from a torch during the pre-heating of an APS process.
Jean et al. in [32] applied ANN to model an APS zirconia coating process, while
Wang et al. [33] used ANN modelling to predict the porosity and hardness of an APS
WC-12% Co powder coating from the spray parameters. Zhang et al. [34] evaluated
the effect of in-flight particle characteristics on the porosity and gas specific
permeability of APS 8 mol% yttria stabilized zirconia electrolyte coatings. In addition to
this literature, the studies in references [35-37] demonstrated the use of ANN in an
APS process from various perspectives.
The research work in this study focuses on an approach, based on the ANN
method to model the APS process in predicting the in-flight particle characteristics from
the input processing power and injection parameters. There has been some work by
past researchers in using the ANN technique to predict the in-flight particle
characteristics of an APS process.
A robust non-linear dynamic system based on ANN was used in the studies [14,
38-40] to complete an APS process control by coupling the diagnostic sensor with a
predictive system to separate the effect of each processing parameter on the in-flight
particle characteristics. A simple multilayer perceptron (MLP) feed forward network
structure, with two hidden layers and quick propagation algorithm [41, 42], was used to
build-up and train the ANNs. The literature studied the interrelated effects of the
parameter interdependencies, correlations and individual effect on coating properties
and characteristics.
Chapter 1: Introduction
Tanveer Ahmed Choudhury Page 4
The authors in reference [43] studied the use of ANN in the complex APS
process. In the second part of their work [44] the authors described an example linking
processing parameters with the in-flight particle characteristics. A similar multilayer
perceptron structure was used with error back propagation algorithm in designing the
ANN models.
Kanta et al. [45, 46] showed the applicability of both ANN and an additional
artificial intelligence technique of fuzzy logic, to correlate and predict the coating
properties and in-flight particle characteristics from the input processing parameters.
Another study [47] used combination artificial intelligence methodology, i.e., the use of
both ANN and fuzzy logic, to predict the in-flight particle characteristics. The particle
characteristics were controlled in real time by adjusting the input processing
parameters; including arc current intensity, the total plasma gas flow rate and hydrogen
content in the plasma gas. The authors [48] also implemented ANN methodology to
establish relationships between in-flight particle average diameter and process
parameters to calculate the in-flight particle average velocity and surface temperature.
All the mentioned studies used two hidden layers of MLP ANN architecture with back
propagation algorithms to model the ANN.
1.3 Research objective
Past work in this field of ANN modelling of the APS process has used a two
hidden layer, multi-layer perceptron ANN structure with the same quick propagation
algorithm, which is based on the error back propagation algorithm.
There were variations concerning the use of different APS parameters and the
manner in which results of the output APS parameters were discussed and analysed.
However, there were not many variations on the ANN modelling aspects in the
available literature; in terms of ANN structure, number of hidden layers and the training
algorithms. Training times of the networks were not mentioned and the sensitivity
analyses of the designed models were not computed. These are two important factors
in establishing the applicability of such ANN models in an on-line process control
system. There was no work performed on simplification of the ANN structures in
reducing both the size and complexity of the designed ANNs.
With the above motivation, the current research aims at using the ANN method
to model the APS process and predict the in-flight particle characteristics from the
Chapter 1: Introduction
Tanveer Ahmed Choudhury Page 5
variations of the input power and injection parameters. This approach will develop and
improve the proposed ANN models. Different training methods and error back
propagation are implemented to improve the generalization ability of the neural
network. Simulations are carried out to justify the optimum number of hidden layers
required for the ANN to learn the process dynamics and generalize the under-lying
input / output parameter relationships. With proper training and good generalization
ability, the designed neural network overcomes the variability and non-linearity
associated with the APS process.
This work further aims at overcoming the technical difficulties associated with
the modelling of an APS process and establish process control with a default multi-
layer perceptron ANN structure. An optimized MLP ANN structure is proposed and
used in this work to overcome the associated difficulties. The proposed structure
provides the network with additional parameters to learn and generalize the process
relationships without increasing the number of hidden layer neurons.
The study works at reducing the model complexity and construct simple ANN
structures. A modular combination of the multi-net system is, thus, used to model the
APS process and predict the in-flight particle characteristics from the input processing
parameters. The modular combination method proposed achieves good correlations
between each of the in-flight particle characteristics with the input parameters with
single hidden layer ANN structures. The segmented approach to ANN allows
simplification of the task in hand and better understanding of the relationships that the
model established between each of the in-flight particle characteristics and the input
processing parameters. The system reliability is enhanced along with improvement of
the overall generalization ability of the designed model.
The learning speed of feed forward neural networks with back propagation
algorithms is far slower than desired. It becomes unsuitable to be incorporated to any
real time system or to an on-line thermal spray control system with a diagnostic tool to
allow the automated system achieve the desired process stability. One of the research
objectives in this study is to improve the learning speed of the designed model and,
thus, a single hidden layer feed forward neural network with an extreme learning
machine algorithm is proposed and used to model the APS process. The extreme
learning machine algorithm generated relative good generalization performance along
with faster network learning time than traditional back propagation algorithms.
Chapter 1: Introduction
Tanveer Ahmed Choudhury Page 6
The study provides a sensitivity analysis of the constructed ANN models to
observe the variations of the output in-flight particle characteristics with the fluctuations
of the input processing parameters. Sensitivity of the trained network’s output to the
variations of the input processing parameters is computed to achieve the research
objective. The applicability and validity of the different ANN models developed
throughout the thesis are not limited to a specific case. An experiment, in relation to the
APS process, is carried out and a validation of the models is presented. This would
provide justification for the use of the developed models in an on-line control system.
The correlations between processing parameters, particle characteristics and
coating properties are of similar complexity; however these are not covered in this
current work. The work has the ability to affect the thermal spray industry by controlling
the spraying process.
1.4 Thesis structure and overview
All the simulations in this work are performed with MATLAB (R2012a: The
MathWorks Inc., Natick, MA, USA). The specification of the personal computer used is:
Intel (R) Core (TM) 2 Duo CPU E8400 @ 3.00 GHz 4 GB RAM.
A mind map of the research thoughts are presented in Figure 1-1. It presents
various aspects of ANN considered in modelling the APS process in predicting the in-
flight particle characteristics from the input processing parameters. In addition, an
outline of the research work in this thesis is presented is presented in Figure 1-2.
Based on a database from the literature, various ANN models are developed and the
performances are analysed. Sensitivity analysis is performed on the developed
networks. Selected ANN models are later tested and validated with a database
obtained experimentally by observing the variations of the in-flight particle
characteristics with the changes of selected input processing parameters.
Chapter 1: Introduction
Tanveer Ahmed Choudhury Page 7
Figure 1-1: A mind map of the research thoughts in this thesis.
Chapter 1: Introduction
Tanveer Ahmed Choudhury Page 8
Figure 1-2: Flowchart outlining the research work carried out in this thesis.
The work done in this thesis is organized in seven chapters. A brief summary of
the contents of each chapter is illustrated below. This would help the reader in
obtaining an overview of the work done before going through each chapter.
Chapter 1 presented the research background, motivation and objectives based
on the literature search.
Chapter 2 presents background studies on different areas covered in this work.
This includes a theoretical introduction to the plasma spray process and artificial neural
networks. Any past work by researchers are illustrated. The chapter also provides an
introduction to the multi-net neural networks. The information would aid the reader in
the better understanding and correlating the work presented in later chapters.
Chapter 3 illustrates different stages of the ANN modelling of the APS process
in predicting the in-flight particle characteristics from the input processing parameters.
It describes the database collection and handling processes along with the database
expansion procedures. ANN training and optimization steps are also illustrated. Results
obtained from the simulations are described, compared and analysed.
Chapter 1: Introduction
Tanveer Ahmed Choudhury Page 9
Chapter 4 starts by discussing the use of a modified ANN structure to model the
APS process for predicting the in-flight particle characteristics from the input power and
injection processing parameters. Modification is achieved through the neural network
structure optimization. The later part of the chapter discusses the use of a multi-net
artificial neural network structure to model the plasma spray process. Modular
implementation is implemented to predict the in-flight particle characteristics. Modular
implementation allows simplification of the optimized model structure with enhanced
ability to generalise the network. It achieves better correlations between each of the in-
flight particle characteristics with the input processing parameters.
Chapter 5 introduces the use of the extreme learning machine (ELM) algorithm
in modelling the APS process. It discusses the use of the ELM algorithm to predict the
in-flight particle characteristics from the input processing parameters. The simulation
results obtained are analysed, discussed and a comparison in performance is
presented with other standard neural network algorithms and structures. The chapter
concludes by providing a sensitivity analysis of different trained ANNs. Sensitivity of the
trained network’s output in-flight particle characteristics were computed with the
variations of the input processing parameters.
Chapter 6 presents an experimental work carried out in relation to the APS
process. It provides a discussion on the experimental set up, process parameters
selection and the data collection. Network testing and analysis is performed with
selected networks developed in earlier chapters and discussions of the obtained
simulation results are presented. The results are analysed to validate the proposed
models applicability to range of different cases and environments.
Chapter 7 presents a conclusion to the thesis; as well as recommendations to
future work.
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 11
Chapter 2 Background Study
This chapter provides background studies and the general description of the
atmospheric plasma spray (APS) process and artificial neural network (ANN) structure
and modelling in Sections 2.1 and 2.2, respectively. Section 2.3 outlines background
information on the multi-net ANN system. This chapter provides grounding for the work
presented in Chapters 3, 4, 5 and 6.
2.1 Atmospheric plasma spray
APS is a highly versatile thermal spray process that combines a high number of
processing parameters that ultimately defines the coating characteristics. The
versatility of the process allows it to operate over a broad range of atmospheric
conditions, velocity and temperature. The presence of inert gases, high gas velocity
and extremely high temperature makes APS the most flexible thermal spray process
with respect to the materials that can be sprayed. The plasma spray process differs
from other coating process in that they deposit large particles on the surface in the form
of liquid droplets or semi-molten or solid particles rather than depositing material as
individual ions, atoms or molecules. The coating feedstock materials generally take the
shape of powders, wires or rods [49]. The high enthalpy of the thermal spray process
characterizes the process as having high coating rates of the order of 50 to 300 g/min
compared to other coating process.
A schematic diagram of a plasma spray process is given in Figure 2-1.
Figure 2-1: Schematic of an atmospheric plasma spray process [50].
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 12
In APS, a typical spray gun consists of a cylindrical water cooled cathode. The
cathode emits electrons thermionically as a high intensity direct current arc (between
300A and 700A) is produced between the tip of a cathode and the cylindrical anode at
about 40 – 80 V [51]. A non-oxidising plasma gas mixture, which is generally a mixture
of argon (primary plasma forming gas) and hydrogen (secondary plasma forming gas),
is injected inside an anode through the rear of the gun. A high enthalpy zone of partially
dissociated and ionised gases operates as the process zone for feedstock.
The feedstock material, generally a powder that is transported with the carrier
gas, is injected into the process zone of the plasma jet where it is heated above its
melting point. The powder injection point can be located inside the nozzle of the
plasma torch (internal injection) or at a very short distance downstream of the plasma
torch exit (external injection, Figure 2-1). The outcome is that the powder particles are
simultaneously heated and accelerated towards the substrate.
Plasma jets, confined by water cooled anodes, are largely heterogeneous
systems incorporating substantial radial and longitudinal variations of temperature and
velocity. Over a radial distance of 30 mm (at atmospheric pressure in air), the
temperature may drop sharply from 15,000 K to almost room temperature and the
velocity may drop from 1,500 m/s to several decades lower [52]. A major reason for
such considerable variations of velocity and temperature is due to the difference in
temperature between the hot plasma jet core and the relatively cold surrounding
environment. The feedstock particles pass through the core of the plasma jet, which is
the hottest portion, to provide maximum exposure for complete melting and
acceleration of the particles.
The inertia of the incoming powder distribution defines their path in the jet. On
striking the substrate, the mostly spherical shaped particles flatten and solidify in a few
microseconds to form thin lamellae, often called splats. Typical solidification rates for
metals vary from 105 to 108 °C/s. The rapid cooling generates a wide range of material
states, from amorphous to metastable phases.
Splats are the fundamental structural building block in the APS coating. There
are a large number of parameters that affect the splat formation; Figure 2-2 [53]. The
coating [54] is generated in a layered structure formed by subsequent stacking of the
splats into from 20 to 100 layers. The materials are added to the original substrate
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 13
surface with little or no mixing or dilution between the coating and the substrate,
preserving the composition of the base material.
Figure 2-2: Thermal spray coating parameters involved in splat formation [53].
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 14
The coatings generated are typically characterized in terms of bond strength,
hardness, corrosion resistance, machinability for finish, electrical properties such as
conductivity, resistivity and dielectric strength; and finally the magneto-optical
properties, such as absorptivity and reflectivity. The coating characteristics of porosity,
oxide content and splat cohesion have significant influence on the coating properties.
In-flight optical sensors are used for real time monitoring of the coating
manufacturing process [19, 34]. A recent study by Mauer et al. [55] compared
measurements of the in-flight particle characteristics by a dichromatic sensor (DPV-
2000 from TECNAR Automation Limited, St-Bruno, QC, Canada J3V 6B5) and laser
Doppler anemometry systems. The DPV-2000 is used in Figure 2-1 at the centre of the
particle flow stream to measure the in-flight dynamic behaviour of the particles.
The sensor is based on a high-speed two colour pyrometry, used specially for
the spray forming process, and can be broken down into three main components [56,
57]; namely, sensor heads, detection box and signal analysis. The sensor head module
collects the image formed through a two-slit photo mask as the hot in-flight particle
passes through the sensor measurement volume. The particle radiation image is
transmitted to the detection box using an optical fibre. The detection box contains two
photo-detectors, a dichroic mirror and two band-pass filters used to separate and filter
the particle radiation image. Finally, the signals are analysed with a computer equipped
with adaptive algorithms.
The particle velocity, V" " , is calculated with Equation 2-1 using the two peak
signals obtained at the output of the photo detector. The two signals are separated in
time by t" "∆ . In Equation 2-1 d" " represents the distance between two photo-mask
slits and M" " represents the magnification of the detection optics.
dV Mt
=∆
Equation 2-1
The particle temperature, T" " , is computed following the Plank’s law and
assuming grey body radiation (Equation 2-2). A typical range of temperature values
was from 1,000 to 4,000°C. In Equation 2-2 c2" " represents the second radiation
constant in the Plank’s law with a value of 1.4388 cmK. R" " is defined as the ratio of
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 15
the signal time integrals from the two photo detector, while 1" "λ and 2" "λ are the
centre wavelengths of the two band-pass filters.
( )cT R
12 1 2 1
1 2 2
ln 5lnλ λ λλ λ λ
−−
= +
Equation 2-2
For calculation of the particle diameter D" " , (Equation 2-3), Plank’s law was
used assuming the particles were spheres. The typical value of diameter ranges from
10 to 300 μm. In Equation 2-3, " "α represents a coefficient including the thermal
emissivity and I1
" "λ represents the radiation intensity.
I
D 1λ
α= Equation 2-3
There are several power and injection processing parameters that influence the
in-flight particle characteristics. The arc current intensity is one such important factor in
the plasma spray process. It can modify the plasma net power by varying the power of
the spray gun. It has a direct control and positive influence on the in-flight particle
characteristics, notably the particle velocity and temperature [58]. The arc current
intensity indirectly influences the microstructural and mechanical properties of the
coating. It improves the hardness and adhesion strength and reduces the porosity level
[59-61]. The plasma gas mixture of argon and helium influences the plasma net
enthalpy and is directly correlated to the in-flight particle characteristics and the
mechanical and microstructural properties [59]. The powder injection parameters,
including the injector diameter, injector stand-off distance and the carrier gas flow rate,
directly influences in-flight particle characteristics [62].
2.2 Artificial neural network
An ANN is a non-liner data modelling tool used to model complex relationships
between a set of inputs and outputs without any prior assumptions or any available
mathematical relations between the inputs and outputs. It is inspired by the structural,
functional and computational aspect of a biological neural network. Aleksander and
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 16
Morton [63], put forward a definition of the ANN summarizing it’s functionality as
follows: “A neural network is a massively parallel distributed processor that has a
natural propensity for storing experimental knowledge and making it available for use. It
resembles the brain in two respects: (1) knowledge is acquired by the network through
a learning process; (2) interneuron connection strengths known as synaptic weights are
used to store the knowledge”.
ANN has high computational rates facilitated by huge parallelism of a large
number of operational non-linear computational elements. It includes the variability and
fluctuations related to the data sets and comprise a group of interconnected artificial
neurons. Artificial neurons in an ANN are the simple and fundamental processing units.
Each neuron is basically a ‘computing processor’. The output of each neuron is a
function of the weighted sum of the inputs [64]. The weights provide a basic concept in
evaluating the parameter relationships and represent the strength of each connection
between the neurons and the inputs.
The model takes an approach to computation, where the strength of each
connection between the neurons is represented by the term ‘weight’ [65]. These
weights provide a basic concept in evaluating the parameter relationships. In order to
have the desired complex input-output relationship, proper optimization of the weight
matrix is essential. The optimization involves modification of the synaptic weights of the
network in an orderly fashion to attain the desired design objective and generate
minimum error between the predicted and actual output. The process is termed as a
learning algorithm and the most powerful algorithm, which is used in this work and is
being used widely, is the back propagation algorithm [42]. Such an approach aligns
closely to the established linear adaptive filter theory, which has been successfully
applied to the fields of communications, control, radar, sonar, seismology and
biomedical engineering [66, 67].
The parallel distributed structure and the ability of the network to learn and
generalize a process, makes ANN highly popular in solving problems that are currently
intractable. In addition, ANN provides the following benefits and capabilities [64]:
1) Nonlinearity: An artificial neuron is a non-linear computational element,
which makes the ANN, as a whole, non-linear. This feature is useful in
modelling non-linear problems, which are difficult to be modelled by existing
mathematical techniques.
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 17
2) Input-output mapping: From a given dataset containing input and output
samples; the network demonstrates the ability to learn examples by
constructing input-output mapping. This is achieved by weight optimization
using a learning paradigm during the network training.
3) Adaptivity: An ANN has the capability to adapt its weights with changes in
the surrounding environment. They can be easily retrained to deal with
minor changes in the operating conditions. This feature is particularly useful
to be implemented for an on-line control system, where the ANN can be
designed to change the weights in real time.
4) Evidential response: For a pattern classification problem, an ANN can be
designed to generate information on both the choice and reason behind the
selection of a particular pattern. This helps in improving the classification
performance through the rejection of ambiguous patterns.
5) Contextual information: The parallel distributed structure allows every
neuron in the network to hold some knowledge of the problem and be
influenced by the global activity of the other neurons in the model.
6) Fault tolerance: An ANN is inherently fault tolerant due to the presence of
massive parallelism. In case of a fault, instead of a catastrophic failure, the
network’s performance reduces [68].
The ANNs can be broadly classified into categories of recognition and function
approximation [69]. In the recognition category, the network is trained to reproduce one
of the previously seen inputs. However, in the case of function approximation, the
network is trained to model complex input-output relationships for generalizing and
predicting outputs from unseen inputs.
The neural networks have a wide variety of applications and have been
implemented in many practical applications, especially at places where it is difficult to
apply conventional mathematical techniques or there are no direct mathematical
relationships between the input and output parameters. In naming a few, the fields of
applications include, but are not limited to, engineering, computer science, materials
science, environmental science, agriculture and biological science, physics, astronomy,
chemistry and medicine. Some of the selected work from past researchers, in various
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 18
field of the application of ANN, is mentioned below. These case studies are mostly
based on function approximating neural networks.
San et al. [70] showed the applicability of neural network in the field of medicine
and heath technology by introducing the method to a hypoglycaemia monitoring
system. The authors used a hybrid particle swarm optimization that was based on a
neural network algorithm for the modelling purpose. Related studies in similar fields are
discussed in references [71-73].
Shu et al. [74] used an artificial neural network, based on a back-propagation
algorithm, for damage detection of a train that was introduced to vibrations on a
simplified bridge model. The study revealed the success of ANN, together with other
statistical models, in correctly estimating the location of damage. Other studies of such
an application of ANN in the field of engineering and technology include studies
covered in references [75-77].
Sideratos et al. [78] used an artificial neural network model for probabilistic wind
power forecasting. The authors used two radial bias function neural networks. Different
input variables were used to predict the wind power obtained by the forecasted wind
speed and the wind farm manufacturer’s power curve. There have been several other
recent works [79-81] on such applications of ANN in power system engineering.
There has also been wide application of ANN in various fields of pattern
recognition and feature extractions. Selected literature is presented in references [82-
86].
The work in this thesis concentrates on a function approximation network,
where the term ‘generalization’ indicates the ability of the network to learn the
underlying input-output relationships and interpolate the training samples intelligently.
Generalization is an ability of the ANN that makes it stand out from other methods of
approximation by having the network trained to be responsive to an unseen
environment.
The trained ANN must be protected from over-fitting, which is a conspicuous
problem for a function approximating neural network resulting in poor generalization. In
such cases, the network fails to respond well when tested and simulated with an
unseen data set. The network actually memorizes the samples it is trained with, instead
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 19
of learning to generalize the process to respond to unknown conditions. A small
training dataset in comparison to the total number of network parameters is one reason
for poor generalization. A small network is unable to overfit the data. A large network
creates more complex functions. Thus, one way of improving the generalization ability
of the network is to use a network that is just large enough to provide an adequate fit.
Figure 2-3 shows an example of a typical over-fitting problem. The blue boxes
represent the data of a noisy sine function, which is fed to an ANN during its training.
The red dashed line represents the response of a trained ANN. The result indicates
that the network has over-fitted the data and, thus, the network would not generalize
well in an unknown environment. The network actually memorizes each data point
instead of trying to map the input-output relationship. The trained network, without
over-fitting, should be able to ignore the noise and learn the underlying function, which
is the sine function for this case. The black line represents the ideal output of such a
type of network without over-fitting.
Figure 2-3: Demonstration of over-fitting for a function approximating artificial neural
network.
Several authors [87-89] have assessed the effects of improving the
generalization ability of a trained ANN by injecting noise to the inputs. There was also
work on using expanded training samples that was generated randomly [90] and in
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 20
accordance to the probability distribution function (PDF) generated by the Parzen-
Rosenblatt estimate [91]. The intent of this procedure was to overcome the problem of
over-fitting and improve the generalization ability of the ANN. The generalization
performance of the trained network was evaluated from the error generated by the
network on the data outside the training data set: known as the ‘generalization error’.
Cross-validation [92, 93] and early stopping [94, 95] are other statistical
techniques to overcome the problem of over-fitting. These reduce the generalization
error and improve the performance of the ANN. The database is divided, in such cases,
into training and validation set. The training set is used during the network learning
stage to compute and minimise the error gradient and update the network’s weights
and biases. The network’s error on the validation set is calculated and monitored
during the training process. This set is not used to update the network’s weights and
biases. As the network’s training starts, the validation error generally decreases along
with the network’s error on the training set. A rise in the validation error for a certain
number of iterations (also described as ‘epochs’) indicates over-fitting of the network.
The network training is stopped under such instances and the weights and bias values,
during the minimum validation error, are stored and saved. A separate test set is used
to evaluate the performance of the trained network by calculation of the generation
error. Several independent data splits are performed, followed by lengthy training, to
achieve statistically significant results.
In cases where cross-validation and early stopping are used, it is important to
have a large database to achieve significant results in terms of having good
generalization performance. Since the validation and the test sets are never used for
training purpose, it can be considered as inefficient use of data available for training the
network. A large database ensures a large training dataset to generate a trained
network with good generalization ability.
Regularization [96] (Section 2.2.2.3) is another statistical technique to combat
the problem of over-fitting. It involves modifying the performance function.
Regularization optimizes the number of network parameters the ANN use for training.
This technique does not require a separate validation set and uses the total available
dataset for network training purposes. This improves the networks training performance
and prevents any data from being unused. Regularization also keeps the network size
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 21
to optimal, which eases and eliminates the pre-training required to determine the
minimum network size to avoid over-fitting.
2.2.1 Network structure
2.2.1.1 Artificial neuron model
An artificial neuron is the fundamental non-linear information processing unit.
Figure 2-4 presents a non-linear model of an artificial neuron k . There are three
components of a neuron:
1) Weights: Weights represents the strength or value assigned to each of the
connecting links or synapses. An input signal px to the neuron k is
multiplied by the synaptic weight kpw . The synaptic weight kpw defines the
strength of the connection between the input px and neuron k .
2) Adder: A linear adder, or a linear combiner, is used for summing the
weighted input signal.
3) Activation function: The summed output of the weighted input signal kv is
limited to some finite value within a permissible amplitude range of output
signal set for the model. This is done by the activation function ( )ϕ . The
function defines the neuron output ky in relation to the activity level at the
function’s input kv .
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 22
Figure 2-4: A Non-linear model of an artificial neuron k .
The output of the linear combiner kv and the neuron ky is represented by
Equation 2-4 and Equation 2-5, respectively.
p
k kj jj
v w x0=
= ∑ Equation 2-4
( )k ky vϕ= Equation 2-5
The basic model of a neuron consists of a bias added to the activation function.
It is represented by the red dotted arrow in Figure 2-4. The neuron model in Figure 2-4
is reformulated for mathematical simplification with an additional fixed input x0
(Equation 2-6) and weight of kw 0 (Equation 2-7) to represent the effect of the bias, kb .
x0 1= + Equation 2-6
k kw b0 = Equation 2-7
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 23
There are three types of activation function ( )ϕ ; namely threshold function,
piecewise-linear function and sigmoid function.
Equation 2-8 provides the threshold activation function. The output ky of a
neuron k using the threshold activation function is represented by Equation 2-9.
( )v1 if v 00 if v < 0
ϕ≥
=
Equation 2-8
( ) k
k
y k1 if v 00 if v < 0
≥=
Equation 2-9
The piecewise linear activation function is represented by Equation 2-10.
( )v v
11 if v2
1 1 if > v > -2 2
10 if v2
ϕ
≥= ≤ −
Equation 2-10
The sigmoid activation functions are the most common type of activation
functions and exhibit both smoothness and asymptotic properties. Unlike the threshold
activation function, which takes values of only 0 and 1, the sigmoid activation function
assumes a continuous range of values. They are thus differentiable, which is an
important desired feature while designing the ANNs and a reason for their popularity.
Examples of sigmoid functions include the logistic function (Equation 2-11) and the
hyperbolic tangent function (Equation 2-12). The parameter ‘a ’ in Equation 2-11
represents the slope parameter and as the name suggests, it defines the slope of the
function.
( ) ( )v
av1
1 expϕ =
+ − Equation 2-11
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 24
( ) ( )( )
vvvv
1 exptanh
2 1 expϕ
− − = = + − Equation 2-12
2.2.1.2 Multi-layer feed-forward neural network structure
A schematic of a multi-layer fully connected feed-forward ANN architecture with
two hidden layers is provided in Figure 2-5.
Figure 2-5: Fully connected multi-layer feed-forward artificial neural network
architecture with two hidden layers.
The structure comprises three main sections: the input layer, the output layer
and the layers in between that are termed as the hidden layers. In order to compute
relationships between the inputs and outputs, neurons in the layers between the input
and output are required to perform the ‘intermediate’ computations. Since an observer
only views the input and output layer parameters, and does not see the inputs and
outputs of the intermediate layers of neurons, these layers are termed as the ‘hidden
layer’.
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 25
The hidden layer is conventionally described as all the other layers, other than
the input and the output layers, consisting of artificial neurons that connect and
establish correlations between the input and output layers. It represents mathematical
functions that are empirically and stochastically derived between the input and output
parameters. The number of hidden layers depends on the type of problem the network
is tasked to solve. The addition of hidden layers allows the network to extract higher-
order statistics [97]. This ability is particularly helpful when the size of the input layer is
large.
The input signal, fed to the network, propagates in the forward direction through
the network on a layer-by-layer basis. This refers to the term “feed-forward” in the
network description. These networks are generally referred to as multi-layer perceptron
(MLP) and have been applied to solve difficult and diverse problems. There are three
distinctive properties of a multilayer perceptron:
1) The model of each neuron in the network contains a smooth (i.e.,
differentiable) nonlinearity at the output end. The presence of nonlinearities
is important because it prevents the input-output relation of the network
getting reduced to that of a single layer perceptron.
2) The network contains one or more layer of hidden neurons that helps the
network to learn complex tasks by extracting progressively useful features
from the input patterns.
3) The network exhibits a high level of connectivity defined by the weights. A
change in connectivity of the network results in a change in the total weight
population.
The multi-layer perceptron generates the superior computing ability through the
combination of these characteristics together with the ability to learn from experience
through training. These characteristics, however, also possess a drawback for the
MLP. The presence of a distributed form of nonlinearity and high network
interconnectivity makes the theoretical analysis of MLP difficult. Secondly, the hidden
layer neurons make visualization of the network difficult.
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 26
2.2.2 Network learning
The process of tuning weights and network parameters is referred to as a
paradigm. A learning paradigm refers to a model of the environment in which the neural
network operates. There are three classes of learning paradigms: supervised learning,
reinforcement learning and self-organized or unsupervised learning.
The block diagram of a supervised learning process is presented in Figure 2-6.
As the name suggests, in supervised learning, the learning process must take place
under the supervision of an external ‘teacher’. The external teacher is the database of
the process containing the input-output examples. The network generates the output
that is then compared with the target output from the database. The difference in
network generated output and target output constitutes the error signal. The network
parameters are then adjusted based on the training vectors along with the influence of
the error signal. The process is repeated until the network emulates the database. A
back propagation algorithm is the most successful, popular and widely used example of
a supervised learning algorithm.
Figure 2-6: Block diagram of the supervised learning process.
Reinforcement learning is an on-line learning technique of the input-output
parameter relationships through a trial and error method. The trial and error process is
designed to maximize the scalar performance index called a reinforcement signal. The
paradigm can be of a non-associative or associative in nature [98].
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 27
In an unsupervised learning environment, there are no external teachers
present to guide the learning process. Figure 2-7 presents a block diagram of such a
process, which, at times, is also referred to as “self-organizing learning”. The network is
tuned to the statistical regularities of the input data. This develops the network’s ability
to form internal representations for encoding features of the input and create new
classes automatically [99]. As the number of computational layers in the network or the
number of incoming links to a neuron increases, the supervised learning tends to
become unacceptably slow. For such cases, an unsupervised learning process
produces better results.
Figure 2-7: Block diagram of the unsupervised learning process.
2.2.2.1 Back propagation algorithm
A set of well-defined rules for the solution of a learning problem is called the
learning algorithm. The MLP structure (Figure 2-5) has been applied in process
modelling and the most successful supervised learning algorithm used for training such
networks is the back propagation (BP) algorithm [42]. The algorithm is based on the
error-correction learning rule.
The BP algorithm consists of two passes through the different layers of the
network. The first one is the forward pass. In the forward pass, an input vector is
applied to the network. Its effect propagates through the network from layer-to-layer
and finally a set of outputs is generated as network’s actual response. The synaptic
weights of the network are fixed during the forward pass. The second and final pass is
the backward pass, when the synaptic weights of the network are changed and tuned
as per the network’s response during the forward pass and the error correction rule.
The actual network response is subtracted from the target response and the resulting
error signal is then propagated backward through the network against the direction of
the synaptic connections. The synaptic weights are, thus, adjusted to move the
network’s response closer to the desired output.
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 28
A brief mathematical description of the BP algorithm, in relation to a specific
neuron j , is provided in the following paragraphs. The description is broken down into
two sections. The first section considers the neuron, j , to be an output layer neuron;
while the second case considers the neuron, j , as a hidden layer neuron.
For the first scenario, where the neuron, j , is considered as the output layer
neuron, a signal flow graph of the neuron is presented in Figure 2-8.
Figure 2-8: Signal flow graph of the output layer neuron j.
The error signal value, ( )je n , at the output of neuron j , at a specific iteration
n , is computed using Equation 2-13. ( )jt n and ( )jy n represent the target and actual
response of the neuron j , respectively, at the specific iteration.
j j je n t n y n( ) ( ) ( )= − Equation 2-13
The instantaneous value sum of squared errors is given in Equation 2-14,
where C represents all the output layer neurons. Equation 2-15 provides the average
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 29
of the sum squared errors with N representing the total number of training samples.
The average squared error value (Equation 2-15) is used as the cost function in
measuring the training performance of the network. The aim of the neural network is to
minimize the value of the cost function by adjusting and optimizing the free parameters
of the network; namely the synaptic weights.
jj C
E n e n21( ) ( )2 ∈
= ∑ Equation 2-14
N
avn
E E nN 1
1 ( )=
= ∑ Equation 2-15
The net internal activity of the neuron j is presented by ( )jv n in Equation
2-16, where p represents the total number of inputs (excluding the fixed value 0 1y =
that represents the effect of bias) applied to the neuron. The synaptic weight, 0jw ,
connected to the fixed input 0 1y = , takes the value of the bias jb applied to the
neuron. The net internal activity of the neuron, j , is finally passed through an
activation function, ( )ϕ , to limit the output signal within a permissible range set for the
model. The final output of neuron j at iteration n is given by ( )jy n (Equation 2-17).
p
j ji ii
v n w n y n0
( ) ( ) ( )=
= ∑ Equation 2-16
( )( )j j jy n v n( ) ϕ= Equation 2-17
For any specific iteration n , the back-propagation algorithm evaluates the
weight correction, ( )jiw n∆ , and applies it to the synaptic weight, ( )jiw n . The weight
correction value is proportional to the partial derivative of the instantaneous sum of
squared errors with respect to the weight value. This provides the instantaneous
gradient, (n) / ( )jiE w n∂ ∂ or the sensitivity factor, determining the direction of weight
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 30
update. For evaluating the gradient, the chain rule is used to break down the partial
derivative and is represented in Equation 2-18.
j j j
ji j j j ji
e n y n v nE Ew n e n y n v n w n
( ) ( ) ( )(n) (n)( ) ( ) ( ) ( ) ( )
∂ ∂ ∂∂ ∂=
∂ ∂ ∂ ∂ ∂ Equation 2-18
Differentiating both sides of Equation 2-14 with respect to ( )je n gives the value
of (n) / ( )jE e n∂ ∂ .
jj
E e ne n
(n) ( )( )
∂=
∂ Equation 2-19
Differentiating both sides of Equation 2-13 with respect to ( )jy n gives Equation
2-20.
j
j
e ny n
( )1
( )∂
= −∂
Equation 2-20
Differentiating both sides of Equation 2-17 with respect to ( )jv n yields the
value of ( ) / ( )j jy n v n∂ ∂ , presented in Equation 2-21.
( )( )jj j
j
y nv n
v n/( )
( )ϕ
∂=
∂ Equation 2-21
Finally the value of ( ) / ( )j jiv n w n∂ ∂ (Equation 2-22) can be obtained by
differentiating both sides of Equation 2-16 with respect to ( )jiw n .
ji
ji
v ny n
w n( )
( )( )
∂=
∂ Equation 2-22
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 31
Substituting the solved partial derivatives, from Equation 2-19 to Equation 2-22
in Equation 2-18, gives the instantaneous error gradient (Equation 2-23).
( )( )j j j iji
E e n v n y nw n
/(n) ( ) ( )( )
ϕ∂= −
∂ Equation 2-23
Using the delta rule; the value of weight correction, ( )jiw n∆ , to be applied to
the synaptic weight, ( )jiw n , is computed in Equation 2-24. The negative sign accounts
for the gradient descent of the weight space. The value of η is a constant and gives
the rate of learning.
jiji
Ew nw n
(n)( )( )
η ∂∆ = −
∂ Equation 2-24
Higher value of η results in a steep gradient descent with large changes in the
synaptic weights. Although the speed of the training process is improved in such cases,
the higher value of learning rate might result in an unstable oscillating network. It might
also cause the network to miss the global minimum set for the cost function. In both
cases, the generalization performance of the network is reduced considerably. Setting
this value too low would reduce the steepness of the gradient descent in the weight
space with smaller changes to the synaptic weights from one iteration to the next. The
cost for obtaining such a result is poor training speed. The training process might
become extremely slow making the network impracticable to be used in process
modelling. Adjustment and optimization of the value of η is, thus, critical for the
network’s ability to learn the input-output parameter relationship with sufficient
accuracy.
Substituting the value of (n) / ( )jiE w n∂ ∂ (Equation 2-23) into Equation 2-24
gives a simplified and re-structured value of the weight correction value ( )jiw n∆
(Equation 2-25). The value of j n( )δ in Equation 2-25 represents the local gradient
defined in Equation 2-26.
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 32
ji j iw n n y n( ) ( ) ( )ηδ∆ = Equation 2-25
( )( )
j jj
j j j
j j j j
e n y nEne n y n v n
n e n v n/
( ) ( )(n)( )( ) ( ) ( )
( ) ( )
δ
δ ϕ
∂ ∂∂= −
∂ ∂ ∂
=
Equation 2-26
With the neuron j set in the output layer, the computations for ( )je n , ( )j nδ
and thus the weight change ( )jiw n∆ is straight forward as shown above. With the
neuron j moved to the hidden layer, the values of ( )je n , ( )j nδ and ( )jiw n∆ need to
be re-calculated, taking into consideration that there is no specified target response for
the neuron. The value of ( )je n cannot be computed using Equation 2-13. This in turn
affects the calculations of ( )j nδ and ( )jiw n∆ in Equation 2-26 and Equation 2-25,
respectively. The following paragraphs illustrate steps for re-computing the updated
weight correction factor ( )jiw n∆ . Figure 2-9 represents an updated version of the
signal flow graph in Figure 2-8, with neuron j used as the hidden layer neuron and an
additional neuron k introduced as the output layer neuron.
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 33
Figure 2-9: Signal flow graph of the hidden layer neuron j connected to the output layer
neuron k.
Making use of Equation 2-26, the value of local gradient ( )j nδ can be
redefined as in Equation 2-27.
( )( )jj j j
j j j
y nE En v ny n v n y n
/( )(n) (n)( )( ) ( ) ( )
δ ϕ∂∂ ∂
= − = −∂ ∂ ∂
Equation 2-27
The value of the partial derivative (n) / ( )jE y n∂ ∂ in Equation 2-27 is unknown
and needs to be computed. The steps are illustrated below.
From Figure 2-9, the error signal ( )ke n at the output of neuron k , at a specific
iteration n , can be re-written as Equation 2-28. This equation is similar to Equation
2-13, except for the fact the output layer neuron is now represented by k as neuron j
is shifted to the hidden layer. ( )kt n and ( )ky n represent the target and actual
response of the neuron k , respectively.
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 34
( ) ( ) ( )k k ke n t n y n= − Equation 2-28
In Equation 2-29, ( )kv n (Figure 2-9) represents net internal activity of the
neuron k , where q represents the total number of inputs (excluding the fixed value
0 1y = placed to represent the effect of bias) applied to the neuron. The synaptic
weight 0kw , connected to the fixed input 0 1y = , takes the value of the bias jb applied
to the neuron. The ( )kv n signal is passed through an activation function, ( )kϕ to
generate the final output of neuron k at iteration n is given by ( )ky n (Equation 2-30).
q
k kj jj
v n w n y n0
( ) ( ) ( )=
= ∑ Equation 2-29
( )( )( )k k ky n v nϕ= Equation 2-30
Substituting the value of ( )ky n , obtained from Equation 2-30 into Equation
2-28, we obtain the updated ( )ke n value presented in Equation 2-31.
( )( )( ) ( )k k k ke n t n v nϕ= − Equation 2-31
For the signal flow graph in Figure 2-9, the instantaneous sum of squared errors
from Equation 2-14 can be re-written by just substituting the neuron index j by k .
The updated value is presented in Equation 2-32. ( )ke n represents the error signal at
the output of neuron k (Equation 2-31). Differentiation of (n)E with respect to the
function ( )jy n gives Equation 2-33.
kk C
n e n21( ) ( )2 ∈
Ε = ∑ Equation 2-32
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 35
kk
kj j
e nE ey n y n
( )(n)( ) ( )
∂∂=
∂ ∂∑ Equation 2-33
Applying the chain rule to the partial derivative ( ) / ( )k je n y n∂ ∂ , Equation 2-33
can be re-written as:
k kk
kj k j
e n v nE e ny n v n y n
( ) ( )(n) ( )( ) ( ) ( )
∂ ∂∂=
∂ ∂ ∂∑ Equation 2-34
Differentiating both sides of Equation 2-31 with respect to ( )kv n gives
( ) / ( )k ke n v n∂ ∂ , presented in Equation 2-35.
( )( )kk k
k
e n v nv n
/( )( )
ϕ∂= −
∂ Equation 2-35
Differentiating both sides of Equation 2-29 with respect to ( )jy n , gives
kkj
j
v n w ny n
( ) ( )( )
∂=
∂ Equation 2-36
Substituting the partial derivatives of k ke n v n( ) / ( )∂ ∂ (Equation 2-35) and
k jv n y n( ) / ( )∂ ∂ (Equation 2-36) in Equation 2-34 gives the value of jE y n(n) / ( )∂ ∂
(Equation 2-37).
( )( )/(n) ( ) ( )( ) k k k kj
kj
E e n v n w ny n
ϕ∂= −
∂ ∑ Equation 2-37
From the definition of local gradient ( )k nδ in Equation 2-26, we can obtain
Equation 2-38.
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 36
( ) ( )( )/( )k k k kn e n v nδ ϕ= Equation 2-38
Placing the value of ( )k nδ (Equation 2-38) into Equation 2-37, we obtain the
desired partial derivative (Equation 2-39).
(n) ( ) ( )( ) k kj
kj
E n w ny n
δ∂= −
∂ ∑ Equation 2-39
Finally, substituting the partial derivative value of jE y n(n) / ( )∂ ∂ (Equation 2-39)
in Equation 2-27, we obtain the local gradient ( )j nδ for the hidden layer neuron j
(Equation 2-40).
( )( )j j j k kjk
n v n n n/( ) ( )w ( )δ ϕ δ= ∑ Equation 2-40
In Equation 2-40, the function ( )( )j jv n/ϕ depends solely on the activation
function associated with the hidden neuron j . Computation of k n( )δ requires the
knowledge of the error signal ke n( ) for all neurons immediate to the right of the hidden
neuron j and directly connected to neuron j (Figure 2-9). The kj nw ( ) term
represents the synaptic weights associated with these connections.
Summarizing the back propagation algorithm, the weight correction factor
jiw n( )∆ , applied to the synaptic weights connecting neuron i to j , is defined by the
delta rule presented in Equation 2-41. The computation of local gradient j n( )δ
depends on whether the neuron j is the output or hidden layer neuron. For j being
the output layer neuron, j n( )δ equals the product of ( )( )j jv n/ϕ and the error signal
je n( ) associated with neuron j (Equation 2-26). j being a hidden layer neuron,
j n( )δ is the product of ( )( )j jv n/ϕ and the weighted sum of s'δ computed for the
neurons in the next layer connected to neuron j (Equation 2-40).
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 37
ji j iw n n y n
Learning InputWeight Local
rate signal ofcorrection gradient
parameter neuron j
( ) ( ) ( )
,
η δ∆ = • •
= • •
Equation 2-41
During the training process of a network, the one complete presentation of the
entire training set is represented by an epoch. The learning process is carried out
iteratively epoch by epoch until the cost function of the average squared error, over the
entire training set, converges to its global minimum. In back-propagation, there are two
ways for the learning process to proceed. These are pattern mode and batch mode. In
pattern mode, the weight update is performed after presentation of each of the training
sample (input-output pattern). In the batch learning mode, the weigh update is
performed after presentation of the entire training samples in an epoch.
For the neural network to be used for an on-line operation, pattern mode
learning is preferred as, due to weight update for presentation of each pattern, the
demand for local storage for each synaptic weight is small. In addition to this, in pattern
mode, the input-output examples are generally presented randomly. This feature along
with the use of pattern by pattern updating reduces the change of the algorithm to be
trapped in the local minimum. In the batch learning mode, a more accurate estimate of
the gradient vector is achievable. However there is a greater demand for local memory
for each synaptic weight, as the weight update is performed at the end of each epoch
after the network is presented with all the training data samples. The use of either
pattern or batch learning mode depends on the type of problem the network is required
to solve.
2.2.2.2 Levenberg-Marquardt algorithm
The Levenberg-Marquardt (LM) algorithm is an approximation to Newton’s
method and is designated to reach the second order training speed without computing
the Hessian matrix. The approximation of the Hessian matrix (H) and error gradient (g)
is computed as per Equation 2-42 and Equation 2-43.
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 38
TH J J= Equation 2-42
Tg J e= Equation 2-43
J represents the Jacobian matrix formed with the first derivatives of the network
errors, e, on the training set with respect to the network’s weights and biases and can
be calculated using the standard back propagation technique [100]. JT is the transpose
of the Jacobian matrix, J.
The LM algorithm uses the approximation in calculation of the Hessian matrix to
update and tune the parameters. If zk represents the old parameter value; then the new
parameter value after calculation of the network errors is given by zk+1 (Equation 2-44).
1
1T T
k kz z J J I J eµ−
+ = − + Equation 2-44
The parameter µ is set to a specific value at the start of the training. After each
epoch, the performance function is computed. If the performance function is found to
be less than the previous epoch then the value of µ is decreased by a specific value.
However, if the performance function increases, the value of µ is also increased by a
specific value. Having the value of µ equal to zero, turns Equation 2-44 into Newton’s
method. The aim is to revert to Newton’s method rapidly since it is faster and more
accurate near an error minimum.
A maximum value of µ is set before the training. If µ reaches its maximum
value, the training stops and indicates that the network has failed to converge. The
training is also stopped when the error gradient (Equation 2-43) falls below a specific
set value or when the goal set for the performance function is met.
The network training steps using the LM algorithm are as follows:
1) All the inputs to the network are presented. The corresponding network
outputs, errors and the sum of square errors, over all inputs are computed.
2) The Jacobian matrix, J, is computed.
3) Equation 2-44 is computed to obtain the new parameter values.
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 39
4) The sum of squares of errors is recomputed with the updated parameter
values.
5) If the new sum of squares is smaller than the previous value, μ is reduced
by a specific factor β and the process is re-started from step 1.
6) If the new sum of squares is increased, the value of μ is increased by α and
the process is re-started from step 3.
The network is assumed to have converged when the error gradient is less than
some predetermined value or when the sum of squares has been reduced to some
specific error goal.
2.2.2.3 Bayesian regularization algorithm
The Bayesian regularization (BR) algorithm works within the framework of the
LM algorithm and involves modification of the typical performance function during the
training of the feed-forward neural network. The term regularization refers to the
method of improving the generalization ability of a network by constraining the size of
the network weights [101].
Typically the performance function, F, (Equation 2-45) used during network
training, is the average sum of squares of network errors (Equation 2-15).
( )N
avn
F mse E E nN 1
1=
= = = ∑ Equation 2-45
N represents the total number of samples in the training set. E n( ) represents
the instantaneous sum of squared errors of the network and is defined in Equation
2-14.
In regularization, a term consisting of the average of the sum of squares of the
network weights, msw (Equation 2-46), is added to the performance function F
(Equation 2-45) to obtain the new performance function, msereg (Equation 2-47). The
parameter jw in Equation 2-46 represents the network weights.
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 40
( )n
jj
msw wn
2
1
1=
= ∑ Equation 2-46
( ) ( )msereg mse mswβ α= + Equation 2-47
In Equation 2-47, α and β are the objective function parameters. If the value of
α is much smaller than that of β, the training algorithm will drive the errors to be
smaller. For the opposite condition, the training will emphasize a smoother network
response at the expense of network errors that are generated. The critical challenge in
regularization is the selection of appropriate values of α and β. Application of the
Bayesian rule is performed [101] to neural network training and optimizing
regularization with reference to selection of the optimal objective function parameters.
The steps in network training and Bayesian optimization of the regularization
parameters [102], within the LM algorithm framework, are presented below:
1) The values of α and β are initialized to 0 and 1, respectively. The weights
were initialized using the Nguyen-Widrow method [103].
2) The LM algorithm is used to minimize the performance function in Equation
2-47 and update the weight matrix accordingly.
3) The effective number of parameters is computed using the Gaussian
Newton approximation to the Hessian matrix available in the LM training
algorithm.
4) New estimates of the objective function parameters α and β are computed.
5) Steps 2 to 4 are iterated until the error converges.
The BR algorithm uses a regularization technique to combat the problem of
over-fitting. The algorithm, thus, uses the whole available dataset for training purposes
without any need for a separate validation set. This method prevents data from being
discarded and maximizes the volume of training data. The algorithm particularly suites
cases where there is a relatively small dataset available for network training.
Furthermore, BR preserves an optimal network size and reduces the pre-training work
required to determine the minimum network size to avoid over-fitting.
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 41
2.2.2.4 Resilient back propagation algorithm
The Resilient Back Propagation (RBP) algorithm eliminates the harmful effects
of the magnitudes of the partial derivatives. Only the sign of the derivative determines
the direction of weight update. The size of the weight change is determined by a
separate update parameter. The update parameter value for each weight and bias is
increased by a specific factor when the derivative of the performance function, with
respect to that of weight, has the same sign for two successive iterations. The update
value is decreased by a factor when the derivative, with respect to that of weight,
changes sign from the previous iteration. If the derivative is zero, the update value
remains the same. Whenever the weights fluctuate, the weight change is reduced. If
the weight continues to change in the same direction for several iterations, the
magnitude of the weight change increases. A detail study of the algorithm is provided in
the reference [104].
2.3 Multi-Net system
A modular combination of the multi-net system is used in Section 4.2 of Chapter
4 to model the APS process and predict the in-flight particle characteristics from the
input processing parameters. Modular implementation allows simplification of the
optimized model structure with enhanced ability to generalise the network. It achieves
better correlations between each of the in-flight particle characteristics with the input
processing parameters. The modular combination concept is new to the field of thermal
spray processes. Applications of any multi-net system in modelling the APS process
have not been reported; therefore, background information is presented as follows.
A multi-net system [105] is a group of ANNs where each network, depending on
the type of combination, is assigned to solve a part of the problem or the total problem.
The original problem is decomposed into sub-problems and each network focuses on
solving a sub-problem. Decomposition of the task makes it easier to understand and
modify. The use of a multi-net system improves the generalization ability of the ANN,
i.e. to correctly respond to inputs that were not used to adapt the network’s weights. A
multi-net system generally provides solutions to problems that either cannot be solved
by a single net, or which can be solved more effectively by a system of modular neural
net components. Similarly, better performance is achieved by the introduction of
redundancy in the number of network.
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 42
There are two combinations of ANNs in a multi-net system [106]: (i) ensemble
combination, and (ii) modular combination. Figure 2-10 gives a pictorial view of the
different classifications of multi-net systems at both task and sub-task levels.
‘Ensemble’ is the commonly used term for combining a set of redundant nets
[107]. In an ensemble combination, each component nets provide a complete solution
to the same task or task components. Each of these solutions might be different to
each other. A final solution is obtained by decision fusion of the solutions of the
redundant component nets. However, in modular approach no redundancies exist for
the component nets. The task is decomposed into subtasks. Each module is provided
with a subtask, for which it provides a solution. The complete task solution is obtained
by combination of all the modules output. In this study, a module is referred to as a self
– contained or autonomous processing unit [108].
Figure 2-10: Classifications of a multi-net artificial neural network system.
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 43
Both the ensemble and modular combinations can exist at the task or sub-task
level. In the task level, an ensemble can consist of a number of different solutions to an
entire task. Similarly, the task could be split into different sub-tasks and fed into
different modules. The task solution is then constructed from the combination of the
decomposed modules. In the sub-task level, when a task is broken down into
component modules, each component can itself consist of an ensemble of nets with
each generating a solution for the same modular component. Alternatively, each
module can be further subdivided into more specific modules. At both levels, the
difference between an ensemble and modular combination is the presence or absence
of redundancy in the number of component networks. There are always redundancies
for ensemble combinations, while a lack of it for modular combination.
Modular and ensemble combinations are not mutually exclusive and a multi-net
system could consist of a mixture of ensemble and modular combinations at different
levels. Both approaches work with the aim of improving the generalization performance
of the networks and both can involve linear combination of their components. However,
the modular combination assumes that each data point is assigned to only one expert
(i.e., it is mutual exclusive) whereas ensemble combination makes no such assumption
and each data point is likely to be dealt with by all the component nets in an ensemble.
2.3.1 Ensemble combination
Combining networks in redundant ensembles helps to improve the
generalization ability. Combination of a set of imperfect estimators is a way to manage
limitations of the individual estimators. The effect of the errors made by each
component net can be reduced by proper combinations. A forecasting study in
reference [109] showed that better results can be achieved by combining forecasts
instead of choosing the best one. Introduction of redundancy and multiple versions is a
standard method in software engineering to increase the reliability [110].
2.3.1.1 Creating ensembles
There is no advantage in combining nets in ensembles that are composed of a
set of identical nets since the identical nets generalize in the same way. The main
emphasis is placed on the similarity or on the pattern of generalization. In principle, the
same solution can be obtained from a set of networks that vary in terms of their
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 44
weights, convergence time and architecture (e.g., number of hidden layers). This is
primarily because the networks generated the same pattern of errors when tested on a
test set. The aim is to obtain networks that generalize differently. A number of training
parameters can be varied and manipulated to make the network generalize differently.
The parameters include: initial conditions, the topology of the nets, the training
algorithm and the training data. A brief overview of each is provided below:
1) Initial weights: Keeping the training data constant, a set of networks can be
created, with each having different initial random weights.
2) Topology: A set of networks can be created having different network
topology or architecture. The networks are trained with a different number of
hidden layers and the same training data. Errors generated by two networks
with different network topology are likely to be uncorrelated.
3) Training Algorithm: A set of networks can be created and trained with a
different training algorithm and the same data.
4) Training Data: A set of networks can be created by altering the training data.
This can be done in a number of different ways outlined below. Ensembles
could be created using one of these factors individually or combining two or
more of these techniques.
a. Sampling Data: A common approach in the generation of a group of
networks for an ensemble is to train each net with a different
subsample of the training data. Cross-validation [111] and
bootstrapping [112] are resampling methods used for this purpose.
b. Disjoint Training Set: The method includes training each network in
the ensemble set with disjoint or mutually exclusive training sets
[113]. There is no overlap in the different training data sets; however,
the size of the set might reduce and result in poor generalization
performance [114].
c. Boosting and Adaptive Resampling: A boosting algorithm is an
effective method in varying the training data for various ensemble
nets. However, the main disadvantage of the algorithm is that it
requires a large amount of data [115]. The Freund and Schapire
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 45
algorithm [116] overcomes the difficulty by adaptively resampling the
training sets such that the weights in the resampling are increased
for those cases that are most often misclassified.
d. Different Data Source: The method includes having the training data
from different input sources. A similar example is relayed in study
[113].
e. Pre-processing: Various pre-processing techniques, such as pruning
[114] or the use of non-linear transformations [117], can be used to
vary the training data of the networks.
2.3.1.2 Combining Ensemble Nets
There are various methods of combining the created ensembles effectively
[118-121]. A brief outline of the methods is provided below.
1) Averaging and Weighted Averaging: A single network output can be created
from a set of ensemble network outputs by simple averaging [122] or
weighted averaging, which takes into account the relative accuracies of the
nets to be combined [122-124].
2) Non-Linear Combining Methods: The non-linear methods used for
combination of ensemble networks include Dempster-Shafer belief based
methods [125], combination using rank based information [126], voting [107]
and order statistics [127].
3) Supra Bayesian: In the supra Bayesian approach, the opinions of the expert
networks are considered themselves as data; thereby allowing the
probability distribution of the experts to be combined with its own prior
distribution.
4) Stacked Generalization: The non-linear network combines the networks with
weights that vary over the feature space [128]. This term is used to refer
both to this method of stacking classifiers and also to the method of creating
a set of ensemble networks trained on different partitions of the data. A
detailed study on stacking is reported in [129].
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 46
2.3.2 Modular combination
Modular approaches are used for improving the performance of a task. The task
can be accomplished with a monolithic network. However, breaking down the tasks into
specialist modules provides better performance. This approach is also used for tasks,
which might not be possible to accomplish, unless the problem is simplified by
decomposition. The ‘divide and conquer’ principle is applied to extend the capabilities
of a single net. The task is divided into a number of sub-problems, where each sub-
problem is then solved with a different ANN architecture and algorithm. This process
allows the system to exploit specialist capabilities. The study in [130] reports a problem
in robotics, whose solution was only obtained as a result of decomposing the task into
three separate components. A similar problem in laid out in study [131]. The solution to
a language parsing task; i.e., mapping from syntactically ambiguous sentences to a
disambiguated syntactic tree was only obtained by decomposing the task into different
modules.
Each module in a modular system can be in the form of an ANN. However, in
principle, some of these components can use non-neural computing techniques. A
study in [132] provides details of a hybrid combination of a knowledge based system
and an ANN. In the field of speech recognition, use of hybrid system architectures is
common [133]. The pre-processing of ANN input before training can also be
considered as a form of modular decomposition for the purpose of simplification of the
problem.
Apart from improving the performance of the ANNs, a reason behind adopting a
modular approach is to reduce model complexity. It helps in making the overall system
easier to understand, modify and extend [134]. Training times can also be reduced
[135] and previous knowledge can be incorporated in terms of suggesting an
appropriate decomposition of a task [134].
2.3.2.1 Creating modular components
In modular combination, decomposition of the task into modular components
can be achieved automatically, explicitly or by means of class decomposition [136].
Explicit decomposition of the modules is performed when there is a strong
understanding of the problem. The division is performed before the training [137] and
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 47
an improved learning performance can be achieved [138]. In the similar manner,
specialist modules might be developed for particular purposes. At times, the modules
provide specialist solution to the same task such that the best performance on the task
will be obtained when the most appropriate module is selected. In the study [139], the
neural net modules were separately optimized to either reduce the number of false
positive errors or the number of false negative errors. Explicit decomposition was
carried out for complex structures with the aim of improving performance or
accomplishing tasks that either could not be accomplished using a monolithic network
or as easily or naturally.
Class decomposition splits a problem into sub-problems based on the class
relationships. The method [140] involves dividing a k-class classification problem into k
two-class classification problems. The division is performed using the same number of
training data for each two class classification as the original k-class problem. The study
[136] reports a further refinement of the problem.
The automatic decomposition of the task is characterized by the blind
application of a data partitioning technique. It is generally carried out in order to
improve performance of the networks. Complex problems are automatically
decomposed into set of similar problems under the divide and conquer approach of
Jacobs and Jordan [141-143].
2.3.2.2 Combining modular components
There are at least four different modes of combining component nets; namely,
co-operative, competitive, sequential and supervisory. Figure 2-11 shows the four
modes of combination.
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 48
Figure 2-11: Four different modes of combining artificial neural network modular
components (a) cooperative combination, (b) sequential combination, (c) competitive
combination, and (d) supervisory combination.
The study [106] classed the co-operative and competitive combinations
together. However, there is a difference in which they combine. In co-operative
combination, some contribution to the decision is assumed to be made by each of the
elements combined. Whereas, in competitive combination, for each of the inputs, the
most appropriate element is assumed to be selected. There are two mechanisms
governing the selection under competitive combination.
Chapter 2: Background Study
Tanveer Ahmed Choudhury Page 49
1) Rule 1: Gating: Expert modules are combined by means of a gating net
[141, 142]. The system is trained to allocate examples to the most
appropriate module.
2) Rule 2: Rule-based Switching: Switching between modules is triggered by
means of explicit mechanisms on the basis of the input. An example of a
form of rule-based switching is provided in the study [139] where, depending
on one of the model’s output, the control is switched between modules.
There are two networks separately optimized in a way that one makes
fewest possible false positive errors and the other generates fewest
possible false negative errors. The output of the first network is used. The
output of the second is only used when the output exceeds an empirically
defined threshold [144].
Sequential combination entails successive processing and the computation of
one module depends on the output of the preceding module. Under the supervisory
relationship, one module is used to supervise the performance of another module.
McCormack [145] describes a system where the parameters of the second network are
selected based on the observations of various parameter values on the performance of
the first network. Another example of a supervisory relationship [146] shows where the
input features and the output of the main ANN module is used to train a supplementary
network to predict the error of a main network.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 51
Chapter 3 Artificial Neural Network Modelling
Work presented in this chapter has been published in the following journal and
conference papers:
T. A. Choudhury, N. Hosseinzadeh, and C. C. Berndt, "Improving the
Generalization Ability of an Artificial Neural Network in Predicting In-Flight Particle
Characteristics of an Atmospheric Plasma Spray Process," Journal of Thermal Spray
Technology, vol. 21, pp. 935-949, 2012.
T. A. Choudhury, N. Hosseinzadeh, and C. C. Berndt, "Artificial Neural
Network application for predicting in-flight particle characteristics of an atmospheric
plasma spray process," Surface and Coatings Technology, vol. 205, pp. 4886-4895,
2011.
T. A. Choudhury, N. Hosseinzadeh, and C. C. Berndt, "Using Artificial Neural
Network to predict the particle characteristics of an Atmospheric Plasma Spray
process," in International Conference on Electrical and Computer Engineering (ICECE),
Dhaka, 2010, pp. 726-729.
3.1 Background
This chapter illustrates the different stages of the artificial neural network (ANN)
modelling of the atmospheric plasma spray (APS) process in predicting the in-flight
particle characteristics from the input processing parameters. One of the major
problems for such a function-approximating neural network is over-fitting, which
reduces the generalization capability of the trained network and its ability to work with
sufficient accuracy under a new environment. Two methods are used to analyse the
improvement in the network’s generalization ability: (i) cross-validation and early
stopping, and (ii) Bayesian regularization. Figure 3-1 presents a flow chart to present
the overall research methodology.
The chapter starts with describing the data collection and processing steps
followed by the database expansion procedure. It is followed by the ANN model build
up, training and optimization steps. Simulations are performed both on the original and
expanded database with different training conditions to obtain the variations in
performance of the trained networks under various environment. The simulation results
obtained from the various network models are discussed and analysed. The chapter
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 52
further studies the predicted values, with respect to the experimental ones, to evaluate
the performance and generalization ability of the network. It helps in analysing the
parameter relationships and correlations. The simulation results show that the
performance of the trained networks with regularization is improved over that with
cross-validation and early stopping. Furthermore, the generalization capability of the
networks is improved; thus preventing and phenomenon associated with over-fitting. A
summary of the work is presented at the end of the chapter.
Figure 3-1: Research methodology for artificial neural network modelling of the
atmospheric plasma spray process.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 53
3.2 Data collection and pre-processing
An important step in artificial neural network modelling involves the database
collection, pre-processing the data and assessing the outputs. A robust and sufficiently
large database is essential for the construction of a network that generalizes well. A
database from the open literature (DSO in Table 3-1) [40] was used in this work. This
database was built experimentally by observing the effect of the relevant APS input
processing parameters on the in-flight Al2O3-13 wt. % TiO2 particle characteristics.
The authors of Ref. [40] measured the in-flight dynamic behaviour of the
particles from the centre of the particle flow stream using a dichromatic sensor (DPV –
2000 from TECNAR Automation Limited, St-Bruno, QC, Canada J3V 6B5). This data
was processed to create a database (DSO) of the average particle velocity, temperature
and diameter for each of the input conditions.
The APS input processing parameters include the following six power and
injection parameters: (i) arc current intensity, (ii) argon primary plasma gas flow rate,
(iii) hydrogen secondary plasma gas flow rate, (iv) argon carrier gas flow rate, (v)
injector stand-off distance, and the (vi) injector diameter. The output parameters
consist of the following in-flight particle characteristics: (i) the average particle velocity,
(ii) temperature, and (iii) diameter.
The in-flight particle characteristics of the plasma jet govern the type, nature
and characteristics of the coatings and are particularly sensitive to the selected six
input processing parameters. Other input parameters of the plasma spray process,
such as those parameters related to powder injection, the type of torch, the spray
distance and the torch movement, which do not have a significant influence on the in-
flight particle characteristics [16, 17], were kept constant to their reference values. This
limits the validity of the model to the investigated cases only.
The database, DSO, contains 16 data values. The variations of the input
processing parameters in Table 3-1 are presented as bold numbers. A single input
processing parameter was varied at any time. The remaining parameters were fixed at
their reference values. This allows in understanding the effect of the variations of each
of the input processing parameters on the in-flight particle characteristics. The
reference values of the input processing parameters are noted later in Table 3-2.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 54
Table 3-1: Experimental database (DSO) from literature consisting of the atmospheric
plasma spray input processing parameters and the output in-flight particle
characteristics [40].
Run I [A]
ArV[SLPM]
2HV[SLPM]
CGV[SLPM]
injD [mm]
ID[mm]
Experimental Values
V [m/s]
T [°C]
D [μm]
1 350 40 14 3.2 6 1.8 242 2,262 43
2 530 40 14 3.2 6 1.8 270 2,399 51
3 750 40 14 3.2 6 1.8 278 2,428 50
4 530 40 0 3.2 6 1.8 205 1,675 30
5 530 40 4 3.2 6 1.8 241 2,170 38
6 530 40 8 3.2 6 1.8 260 2,351 45
7 530 40 10 3.2 6 1.8 264 2,373 47
8 530 45 15 3.2 6 1.8 176 2,403 51
9 530 22.5 7.5 3.2 6 1.8 179 2,456 49
10 530 37.5 12.5 3.2 6 1.8 263 2,393 50
11 530 40 14 2.2 6 1.8 252 2,352 48
12 530 40 14 4.4 6 1.8 277 2,440 54
13 530 40 14 3.2 7 1.8 270 2,434 47
14 530 40 14 3.2 8 1.8 278 2,451 52
15 530 40 14 3.2 6 1.5 265 2,498 54
16 530 40 14 3.2 6 2 278 2,363 43
I Current Intensity
ArV Argon primary plasma gas flow rate
2HV Hydrogen secondary plasma gas flow rate
CGV Argon carrier gas flow rate
injD Injector stand-off distance
ID Injector diameter
V Average in-flight particle velocity
T Average in-flight particle temperature
D Average in-flight particle diameter
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 55
The values of the input processing parameters and the output in-flight particle
characteristics require a linear transformation before being used for any ANN training
and testing purposes. This normalization procedure ensures equal treatment from ANN
in handling and processing the data. It also prevents any calculation error related to
different parameter magnitudes. The values are normalized using Equation 3-1 [38]
−
=−
MINNORM
MAX MIN
X XXX X
Equation 3-1
NORMX stands for the normalized parameter value, MAXX and MINX are the
maximum and minimum possible values of the parameters based upon their physical
limitations of the process, not from the experimental sets. The physical limits of each
input and output variable are given in Table 3-2 [40].
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 56
Table 3-2: Physical limits of the atmospheric plasma spray input processing
parameters and the output in-flight particle characteristics along with the input
parameters reference values [40].
Variable Lower Limit
Highest Limit
Reference Value
Current Intensity, I [A] 303 840 530
Argon Plasma Gas Flow Rate, ArV [SLPM] 20 44 40
Hydrogen Plasma Gas Flow Rate, 2HV [SLPM] 0 17 14
Carrier Gas Flow Rate, CGV [SLPM] 2 5 3.2
Injector Diameter, ID [mm] 1.5 2 1.8
Injector Stand-off Distance, injD [mm] 6 8 6
Average In-Flight Particle Velocity, V [m/s] 122 408 …
Average In-Flight Particle Temperature, T [°C] 1,236 3,240 …
Average In-Flight Particle Diameter, D [μm] 14 101 …
3.3 Database expansion
An artificial sequence of input-output vectors was created with the view of
expanding the database. The expanded database, when used for network training, is
expected to combat the problem of over-fitting and produce a network with good
generalization ability. In this study, kernel regression [147] is used for the expansion of
the database. Additive white Gaussian noise is added to each of the input-output
training vectors based on the Parzen-Rosenblatt estimate [148-150] of the true input –
output vector density.
The kernel estimate, also called the Parzen-Rosenblatt estimate, is a non-
parametric way of estimating the probability density function of a random variable. As
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 57
an illustration, from some sample data of a population, kernel estimation allows
extrapolation of the data to the entire population. Let x be a random number in the
Euclidean space dR and the distribution of x is described by the probability density
function f . Suppose K be the kernel in the d-dimensional space and nh a smoothing
parameter called the bandwidth, taking values greater than 0. Now if 1, , nX X are
samples of n independent observations of X , then the n-point kernel estimate of f
corresponding to K and h is:
( )=
−= ∈
∑,
1
1 1 ,n
din h d
i n n
x Xf x K x Rn h h
Equation 3-2
The characteristics of the additive noise are controlled by the parameters K
and h . For kernel density approximation, the Gaussian probability density function
(PDF) is used. With mean of 0 and variance 1, the Gaussian kernel becomes,
( )
π
−−− =
2
2212
ix Xi hx XK e
h Equation 3-3
( )( )
π
−−
=
= ∑2
22,
1
1 12
ix Xnh
n hi
f x en h
Equation 3-4
From Equation 3-4, we find that the value of the bandwidth h indirectly controls
the variance of the Gaussian PDF along each dimension.
Kernel regression is a non-parametric technique in statistics to estimate the
conditional expectation of a random variable without assuming any underlying
distribution to estimate the regression function [151]. The idea is to map an identical
kernel, which is the Gaussian kernel (Equation 3-5) in this case, local to each
observation data point. The kernel assigns weight, w , to each location based on
distance from the data point. The function depends only on the width from the local
data point to a set of neighbouring locations. By inserting Gaussian kernels in the
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 58
original data iX , the value of the original data is extended to a much smaller value at a
certain small step of dx .
( )−
−=
2
2( )
2,ix X
hh iK x X e Equation 3-5
The kernel values are computed for each data point iX . Then, the estimated
value of jy at domain value jx is computed according to the kernel regression formula
(Equation 3-6), also called the Nadaraya – Watson kernel weighted average.
( )
( )=
=
=∑
∑1
1
,
,
n
i h j ii
j n
h j ii
w K x Xy
K x X Equation 3-6
The nominator of the kernel regression formula (Equation 3-6) is an array of the
sum of the products of the Gaussian kernels (Equation 3-5) and the weight. The
denominator is the sum of the kernel values at domain jx for all data points iX . With
the computed value of y , the sum square error (SSE) is calculated by comparison with
the original iY data. The associated weights are optimized for each kernel to minimize
SSE. The regression is solved by computing the estimated values of y with the new
array of weights. Tabulating all the values of domain x and the corresponding
estimated values of y gives rise to the expanded database DSE, tabulated in
Appendix B. The expanded database, DSE, is approximately nineteen times the original
database (DSO).
For the purpose of testing the generalization performance of the trained
networks, 20% of the expanded database, DSE, is selected by an interleaving process
as the test set. This is called “DSET”. Interleaving insures that the test data set
represents an overall view and statistical representation of the whole database. The
remaining 80% of the data are set aside as a training set (called “DSETR”) used for
network training and optimization purposes. Similarly, 20% of the original database
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 59
DSO is selected by interleaving as the test set (DSOT) and the remaining 80% of the
data as the training set (DSOTR).
3.4 Network architecture
A simple ANN model considering the multi-layer perceptron (MLP) approach
and based on a back propagation algorithm is used in this section. This permits the
prediction of a complex non-linear relationship between the input processing
parameters and the output in-flight particle characteristics [42] present in the database
generated experimentally from the APS process.
The block diagram of the designed ANN is presented in Figure 3-2.
Figure 3-2: Block diagram of the designed multi-layer artificial neural network.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 60
In Figure 3-2, jiw (where i = 1…8 and j = 1…N1) represents the input layer
weights. The terms α ji (where i = 1… N1 and j = 1… N2) and β ji (where i = 1…N2 and
j=1…3) represent the hidden layer weights and output layer weights, respectively. 1N
and 2N represents the number of linear nodes or neurons in hidden layer 1 and hidden
layer 2.
There exists no generalized rules to specify the exact values of 1N and 2N . The
number depends on the nature of the problem that the network encounters and the
network optimization process. The large number of hidden layer neurons provides
network flexibility to optimize many parameters and reach an improved solution.
However, increasing the size of the hidden layer over a certain limit makes the network
under-characterized. The network in such cases is forced to optimize more parameters
than the data vectors available to define these parameters. Too few a number of
neurons in the hidden layers leads to under-fitting. The performance of a trained ANN
is sensitive to the size of the hidden layers and the optimum number and combination
of neurons in the hidden layers are determined from the network training and
optimization process.
The multi-layer architecture comprises three parts: the input layer, the hidden
layers and the output layer (Figure 3-2). The number of data points required to define
each of the input parameters depends on the nature of the parameter. One data point
is required to represent a real valued parameter and x data points are required to
describe x2 classifications or categories [39].
In this study, only the parameters of (i) injector diameter ( )ID and (ii) injector
stand-off distance ( )injD represented classifications with three distinct values and
were, thus, described by two data points each (Table 3-3). All the other input variables,
which includes current intensity, argon primary plasma gas flow rate, hydrogen
secondary plasma gas flow rate and argon carrier gas flow rate, are continuous real
valued parameters and were represented by one data point each. The input layer thus
consisted of 8 data points. The same rule was applied to define the number of neurons
for each output parameters. The output layer had 3 neurons as all the output
parameters were real valued parameters and were represented by one neuron each.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 61
The number of hidden layers depends on the type of problem the network
addresses. For the study in this section, two hidden layers were required to handle the
non-linearity of the process and generalize the input / output parameter relationship.
Justification for the use of the number of hidden layers is provided in later paragraphs.
Table 3-3: Data point values to represent classifications of the following input
processing parameters.
Injector Stand-off Distance (Dinj)
Value [mm] Data Point 1 Data Point 2
6 0 0
7 1 0
8 1 1
Injector Diameter (ID)
Value [mm] Data Point 1 Data Point 2
1.5 0 0
1.8 1 0
2.0 1 1
3.5 Network training and optimization
The most popular approach of the BP algorithm (Section 2.2.2.1) is the
conjugate gradient or quasi Newton (secant) method, which uses standard numerical
optimization techniques [152-154]. However, for the quasi Newton method, the storage
and computational requirements grow quadratically with the size of the network. With
similar storage and computational requirements, a non-linear least squares numerical
optimization method [155], such as the Levenberg-Marquardt algorithm [156], is more
efficient than the conjugate gradient method or the variable learning rate algorithm for
the network of a few hundred weights [100]. Other standard back propagation
algorithms are slow and require a lot of off-line training. They also suffer from temporal
instability and tend to become fixed to the local minima [157].
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 62
Thus, taking into consideration that the network size dealt in this work is within
a few hundred weights, the initial back propagation paradigm selected for the ANN
training and optimization purpose was the Levenberg-Marquardt algorithm (Section
2.2.2.2). With the Levenberg-Marquardt (LM) algorithm, cross-validation and early
stopping statistical techniques were applied to train the neural network. The Bayesian
regularization (BR) algorithm (Section 2.2.2.3) was later used to replace the LM
algorithm using cross-validation and early stopping to view further changes in the
generalization ability of the neural network.
Initially it was important to determine the optimal number of hidden layers.
Simulation was started with one hidden layer. The maximum number of allowed epochs
for each training cycle was set to 10,000. This ensured that the network was allowed to
train for sufficient time until the error gradient converges completely or any pre-defined
stopping criteria were reached. The transfer function used in all layers in the log-
sigmoid function and the error performance function was set to a mean absolute error
(MAE) (Equation 3-7).
=
= ∑1
1 n
iMAE PredictedValue-TrueValue
n Equation 3-7
The number of neurons in the hidden layer was varied from four to twenty with
increment of one neuron. For each case, the network was trained several times with
the database DSETR and the network generating maximum correlation coefficient, R,
value on the test set, DSET, was stored and saved. Details regarding the correlation
coefficient (R) are discussed later in this chapter. The average MAE value was also
computed for all the networks.
The number of hidden layers was increased to two and the number of neurons
in each layer was varied from four and three (4-3) to twenty and nineteen (20-19),
respectively. The network training and performance measurement was repeated as
above. Similar simulations were performed for three and four hidden layer networks.
For the networks with three hidden layers, the number of neurons in each layer was
varied from four, three and two (4-3-2) to twenty, nineteen and eighteen (20-19-18),
respectively. Finally for the networks with four hidden layers, the number of neurons in
each hidden was varied from four, three, two and one (4-3-2-1) to twenty, nineteen,
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 63
eighteen and seventeen (20-19-18-17), respectively. In all the cases with two, three
and four hidden layers, the number of neuron, in each hidden layer, was increased by
one.
Figure 3-3 provides a summary of the performance comparison of the networks
having a different number of hidden layers and trained with two algorithms. The
network with two hidden layers generated the minimum error when trained with both
algorithms. Considering the results obtained from Figure 3-3, along with the non-
linearity associated with the process under consideration in this work , the number of
hidden layers in the designed ANN is set to two (Figure 3-2).
Figure 3-3: Network performances with different algorithms and number of hidden
layers.
Before starting the ANN training, all the network weights and parameters were
initialized to random values. The error performance function was re-set to mean square
error (MSE) (Equation 2-45). The number of neurons in the first and second hidden
layer was initially set to four and three neurons, respectively. The ANN was first trained
with the Levenberg-Marquardt algorithm along with cross-validation and early stopping.
It was first presented with the dataset DSOTR.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 64
A large number of training data is essential to enhance the level of accuracy of
a trained network. However, at the same time, it is also important to have a sufficiently
large validation set to investigate the generalization ability of the designed model.
Thus, the number of samples to be assigned in each of the subsets is an important
consideration. The dataset, DSOTR, is divided by interleaving two subsets: the training
set and the validation set. The data division ratio (training set: validation set) is set to
0.90:0.10, 0.85:0.15, 0.80:0.20, 0.75:0.25 and 0.70:0.30. The standard deviations of
the training and the validation set were computed. The bar chart depicting absolute
differences between the standard deviations of the training set and the validation set is
shown in Figure 3-4. From the analysis, a data division ratio of 0.85:0.15 was chosen,
since the difference in standard deviations was the least. This result depicts the training
and validation sets being statistically most similar to each other in terms of data
variations and fluctuations and provides a strong base to training a network having
good generalization ability.
The network was trained several times. After each of the training cycles, the
trained network was simulated with the test set DSOT. The network producing maximum
correlation coefficient (R) values on the test set was stored and saved along with the
MSE values. The combination of the number of neurons in each of the hidden layers
was varied several times and the whole process repeated.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 65
Figure 3-4: Difference in standard deviations of the training and validation sets for
DSOTR.
The database was then replaced with DSETR. The data was again interleaved
into training and validation set and the data division ratio was set to 0.90:0.10,
0.85:0.15, 0.80:0.20, 0.75:0.25 and 0.70:0.30. The standard deviations of the training
and the validation set are computed and their absolute difference is presented below in
Figure 3-5. The data division ratio of 0.80:0.20 was chosen for having the lowest
deviations between the standard deviations of the sets. The network was once again
trained and validated with the same combination of the number of hidden layer neurons
used previously for training the network with database DSOTR. For each combination of
the number of neurons in the hidden layer, the network training procedure was
repeated several times as before. The network generating the maximum R-value on
the test set DSET was stored and saved with their respective MSE values.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 66
Figure 3-5: Difference in standard deviations of the training and validation sets for
DSETR.
The MSE generated by the network when simulated with an unseen data set
provides a measure of the ‘generalization error’ or the performance of the trained
network. For this study the unseen set is the test set, whose values were not presented
to the network during the training process. The lower this error, than the better is the
network’s performance and ability to generalize the process and predict with sufficient
accuracy under unseen environments.
Table 3-4 provides a table of all the generalization errors generated by the
networks. When compared with the performance of the networks trained with DSOTR,
the performance of the ANN trained with DSETR, in terms of its generalization ability,
shows improvement with a smaller average generalization error value of 9.39x10-5 in
comparison to that of 5.12x10-2 produced by networks trained with DSOTR. These
values are presented in bold at the end of Table 3-4.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 67
Table 3-4: Generalization errors generated by the networks trained by Levenberg-
Marquardt algorithm with datasets DSOTR and DSETR.
Number of neurons in the first and second hidden layer
Generalization Error (MSE)
Network trained with DSOTR
Network trained with DSETR
04-03 1.25 x 10-3 1.60 x 10-4
05-04 3.70 x 10-4 7.00 x 10-5
06-05 6.52 x 10-2 7.00 x 10-5
07-06 1.93 x 10-3 6.00 x 10-5
08-07 1.01 x 10-3 1.20 x 10-4
09-08 * 1.94 x 10-3 2.00 x 10-5
10-09 3.90 x 10-4 8.00 x 10-5
11-10 6.26 x 10-2 9.00 x 10-5
12-11 1.67 x 10-3 9.00 x 10-5
13-12 6.78 x 10-2 4.00 x 10-5
14-13 1.19 x 10-1 9.00 x 10-5
15-14 6.51 x 10-2 1.80 x 10-4
16-15 1.19 x 10-1 9.00 x 10-5
17-16 1.19 x 10-1 1.10 x 10-4
18-17 6.11 x 10-2 7.00 x 10-5
19-18 6.49 x 10-2 1.80 x 10-4
20-19 1.18 x 10-1 6.00 x 10-5
Average generalization error (MSE) 5.12 x 10-2 9.39 x 10-5
* Referenced as “NN1” within the text
The computed correlation coefficient, R, values on the test set provides an
understanding of how well the trained network’s response to the unseen input fits the
respective actual outputs. The larger the average R-value better is the correlation
between the predicted and actual value. Figure 3-6 provides a comparison of the R
values that depicts, as found previously, improvement of the network’s generalization
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 68
ability. The average R-value for the ANN trained with DSOTR is 0.9485, whereas the
average R-value of the ANN trained with DSETR has a higher value of 0.9946.
Figure 3-6: Correlation coefficient (R) variations with various artificial neural network
structures on the test set.
The simulation training time is expressed as the number of epochs required by
the network, during its training, to reach the minimum error. The average number of
epochs for the network trained with DSOTR was 6; whereas, the average number of
epochs for the network trained with DSETR were 61. The longer training time arises from
the greater volume of data presented to the network during its training, even after data
division. In spite of the longer average training cycle, the generalization capability of an
artificial neural network, using the Levenberg-Marquardt algorithm as the training
algorithm, is improved and allowed the network to better learn the process represented
by the database.
Considering Figure 3-6 and Table 3-4, the network with a combination of nine
and eight neurons in the first and second hidden layer respectively generates the
Referenced as “NN1” within the text
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 69
lowest generalization error of 2.00x10-5 with a corresponding R-value of 0.9988. For
further referencing, this network is referred to as NN1.
The training algorithm was then changed to Bayesian regularization from the
Levenberg-Marquardt algorithm. The network was presented with DSETR and the initial
number of neurons in the first and second hidden layers was set to four and three,
respectively. The network was trained several times and as before, the trained network,
each time, was tested with the test set DSET. The network generating the highest R-
value on DSET was stored and saved along with the generalization error values. The
training was repeated for the same combinations of neurons in the hidden layers, as
used previously on training with the Levenberg-Marquardt algorithm, and the same
procedure repeated.
A bar chart combining R-values generated by the network trained with Bayesian
regularization algorithm on the test set DSET, with corresponding R-values generated
by the networks trained with the Levenberg Marquardt algorithm on the same test set
DSET, is presented in Figure 3-7. The response of the networks, trained with Bayesian
regularization, to the test set demonstrates a better match to the actual test set outputs
(average R-value of 0.9992) than for the networks trained with the Levenberg-
Marquardt algorithm (average R-value of 0.9946).
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 70
Figure 3-7: Correlation coefficient (R) variations with various artificial neural network
structures on the test set.
Figure 3-8 presents a bar chart comparison of the generalization errors
generated by the two networks. The generalization errors for the networks trained with
the Bayesian regularization algorithm are much smaller (with an average value of
1.44x10-5) than those for the networks trained with the Levenberg Marquardt algorithm
(with an average value of 9.39x10-5). Both from the correlation coefficient and
generalization error measurements, it is found that, with the same database, the
Bayesian regularization algorithm was successful in training the networks with better
generalization ability than the Levenberg-Marquardt algorithm.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 71
Figure 3-8: Generalization error variations with various artificial neural network
structures on the test set.
The average training time for the networks trained with the Bayesian
regularization algorithm was 6,889 epochs, in contrast to the average training time of
61 epochs for the networks trained with the Levenberg-Marquardt algorithm. The
average time was greatly increased with the implementation of regularization.
However, since the training was performed off-line, this increase would not be a
problem when compared to the advantage of having an ANN with better generalization
performance.
The results for all the networks trained with the Bayesian regularization
algorithm are accumulated in Figure 3-9. The network with a combination of eight and
seven neurons in the first and second hidden layers was found to generate the
maximum R-value of 0.9996 with a corresponding minimum generalization error of
7.79x10-6. For further use in this work, this network is referred to as NN2.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 72
Figure 3-9: Network performance on test sets for various artificial neural network
structures trained with Bayesian Regularization algorithm.
The above ANN training and optimization results demonstrate that no specific
rules or trends exists that indicate the precise number of neurons in the hidden layer.
The optimized number of neurons in the hidden layer needs to be found through the
network training and optimization process.
Referenced as “NN2” within the text
Referenced as “NN2” within the text
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 73
The use of the performance function during the BR algorithm training resulted in
the smooth network response and improved the generalization ability of the trained
network compared to the ones trained with the LM algorithm. One of the important
features of the Bayesian regularization algorithm is that it measured the number of
network weights and biases that are used by the network for the training purpose. This
algorithm uses an optimum number of parameters during training, unlike the LM
algorithm, which uses all the available parameters during network training.
Figure 3-10 shows the total number of network parameters (number of weights
and biases) against the optimum number of parameters used during training of
networks with various combinations of the number of neurons in the hidden layers. For
a particular case of a network with twenty and nineteen neurons in the 1st and 2nd
hidden layers, the LM algorithm used all of the 639 available network parameters
during the training process. On the other hand, the BR algorithm optimized the number
of parameters to 172.
Figure 3-10: Number of network parameter variations with various artificial neural
network structures.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 74
The use of such optimum parameters removes the chance of network response
to over fit the actual response. However, it increases the fluctuations and variations
associated with the parameter values. Figure 3-11 presents the standard deviations of
all the network parameters for networks trained with the Levenberg-Marquardt and
Bayesian Regularization algorithms. The average standard deviation for the networks
trained with the Bayesian Regularization algorithm is 22.27. On the contrary, the
average standard deviation for the networks trained with the Levenberg-Marquardt
algorithm, which uses all the parameters during network training, calculates to be a
lower value of 3.19. This results from the use of all parameters and allowing the
weights to be more evenly distributed with lower fluctuations.
Figure 3-11: Standard deviations of the network parameters for different neural network
structures trained with both Levenberg-Marquardt and Bayesian Regularization
algorithms.
3.6 Simulation result analysis and discussion
The networks NN1 and NN2 are individually used to simulate the original
database (DSO in Table 3-1). The predicted values obtained are compared with the
experimental ones and the corresponding MSE values are computed for each of them.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 75
Regression analysis is also performed and the correlation coefficients, R-values, were
calculated. The values of MSE along with R provide a measure of the performance of
the two networks, trained and optimized by two different algorithms with the expanded
dataset DSETR, on the database DSO. It allows the ANN correlation of the effect of each
of the input processing parameters to be related to the output in-flight particle
characteristics.
The MSE generated by the network NN1 is 0.015 with a corresponding R-value
of 0.9154. On the other hand, the MSE generated by the network NN2 is of lesser
value at 9.74x10-4 with a higher R-value of 0.9996. In accordance with the results
obtained from network training and optimization, the network trained with the Bayesian
Regularization algorithm provides better performance on the database in comparison to
the network trained with the Levenberg-Marquardt algorithm. These results represent
the overall performance of the networks. However, further analysis is performed, as
below, to view the generalization performance in predicting each of the three output
parameters and the correlation drawn by the ANN between each of the input
processing parameters on the output in-flight particle characteristics.
The predicted output in-flight particle characteristic values from both the
networks NN1 and NN2 were compared with their respective experimental values and
the absolute value of the relative error percentage, with respect to the experimental
values, was calculated; Table 3-5. The absolute average relative error percentages for
in-flight particle velocity, temperature and diameter generated by NN1 are 4.68%,
4.19% and 2.84%, respectively. For NN2, the values are 0.24%, 0.10% and 0.53%,
respectively. These values are highlighted as bold numbers at the end of Table 3-5.
The predicted velocity, temperature and diameter values by the network NN2
demonstrate better coherence and correlation with the experimental values than that of
the network NN1. This is represented by lower individual and average relative error
percentage values by network NN2. The order of magnitude in errors obtained is within
the experimental error of these physical measurements; implying that the methods
adopted in this work are acceptable. All the predicted values were obtained from the
analysis of the complete database and represent the existing correlations, not any
standard fitting procedures.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 76
Table 3-5: Experimental and predicted in-flight particle characteristics values for the
selected networks NN1 and NN2 along with the absolute relative error percentage.
Run Network NN1 Network NN2
V [m/s] T [°C] D [μm] V [m/s] T [°C] D [μm]
1 Predicted Value 241.81 2,260.72 42.95 241.74 2,260.61 42.91 Relative Error% 0.08 0.06 0.11 0.11 0.06 0.20
2 Predicted Value 264.26 2,388.87 50.56 263.20 2,391.43 50.69 Relative Error% 2.13 0.42 0.86 2.51 0.31 0.61
3 Predicted Value 277.60 2,427.72 49.89 277.96 2,427.78 49.96 Relative Error% 0.14 0.01 0.21 0.01 0.01 0.07
4 Predicted Value 205.12 1,693.91 30.14 205.03 1,677.44 29.97 Relative Error% 0.06 1.13 0.47 0.01 0.15 0.09
5 Predicted Value 240.71 2,164.44 37.93 240.49 2,162.54 37.96 Relative Error% 0.12 0.26 0.18 0.21 0.34 0.11
6 Predicted Value 259.62 2,350.78 44.87 260.06 2,353.80 44.84 Relative Error% 0.15 0.01 0.29 0.02 0.12 0.37
7 Predicted Value 263.75 2,371.19 47.00 264.01 2,371.85 47.07 Relative Error% 0.09 0.08 0.00 0.00 0.05 0.16
8 Predicted Value 249.13 2,128.36 40.73 175.99 2,402.97 51.00 Relative Error% 41.55 11.43 20.13 0.01 0.00 0.00
9 Predicted Value 179.12 2,455.35 48.96 179.00 2,457.35 49.01 Relative Error% 0.07 0.03 0.07 0.00 0.06 0.01
10 Predicted Value 263.11 2,392.45 49.98 263.03 2,393.03 50.01 Relative Error% 0.04 0.02 0.03 0.01 0.00 0.01
11 Predicted Value 251.91 2,352.74 48.03 251.89 2,351.66 47.97 Relative Error% 0.04 0.03 0.07 0.04 0.01 0.05
12 Predicted Value 276.68 2,441.44 54.05 277.01 2,439.99 54.00 Relative Error% 0.12 0.06 0.09 0.00 0.00 0.00
13 Predicted Value 260.75 2,345.14 46.99 272.50 2,421.86 50.18 Relative Error% 3.42 3.65 0.02 0.93 0.50 6.76
14 Predicted Value 278.02 2,450.93 52.00 278.00 2,451.00 52.00 Relative Error% 0.01 0.00 0.00 0.00 0.00 0.00
15 Predicted Value 193.73 1,251.50 41.64 265.00 2,498.00 54.00 Relative Error% 26.89 49.90 22.90 0.00 0.00 0.00
16 Predicted Value 277.97 2,362.73 43.01 278.00 2,363.00 43.00 Relative Error% 0.01 0.01 0.01 0.00 0.00 0.00
Average Relative Error% 4.68 4.19 2.84 0.24 0.10 0.53
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 77
The absolute average relative error percentages of the predicted in-flight
particle characteristics, for each of the input processing parameters, are presented in
Table 3-6. This provides an understanding of how well the networks were able to
correlate the in-flight particle characteristics with the individual input processing
parameters. The better performing networks, under each case, are highlighted in bold.
Supporting the finding in Table 3-5, the NN2 was found to be the better performing
network in predicting the in-flight particle characteristics from the individual input
processing parameters. Only in the following two cases NN1 predicted the particle
characteristics with higher accuracy: (i) predicting in-flight particle velocity from
variations of current intensity, and (ii) predicting in-flight particle diameter from the
variations of injector stand-off distance.
Table 3-6: Absolute average relative error percentage of the predicted in-flight particle
characteristics with the variations of each input processing parameters.
Input processing parameters
Absolute average relative error percentage (%) *
In-flight particle Velocity, V
In-flight particle Temperature, T
In-flight particle Diameter, D
NN1 NN2 NN1 NN2 NN1 NN2
Current intensity 0.78 0.88 0.16 0.13 0.39 0.29
Hydrogen content 0.10 0.06 0.37 0.16 0.23 0.18
Total plasma gas flow rate 13.89 0.01 3.83 0.02 6.75 0.01
Argon carrier gas flow rate 0.08 0.02 0.05 0.01 0.08 0.03
Injector stand-off distance 1.72 0.46 1.83 0.25 0.01 3.38
Injector diameter 13.45 0.00 24.96 0.00 11.46 0.00
* Absolute average relative error percentage of the predicted values with respect to the
experimental values.
* The bold values indicate the better performing network.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 78
For both networks NN1 and NN2, each of the predicted and experimental
output average in-flight particle characteristics were plotted against the six input
processing parameters; i.e., the current intensity, hydrogen flow rate, total plasma gas
flow rate, argon carrier gas flow rate, the injector stand-off distance and the injector
diameter, Figure 3-12 to Figure 3-17. These plots allow comparisons of the predicted
values with respect to experimental data and provide insights concerning the
parameter relationships and correlations for the APS process.
Figure 3-12 presents the average in-flight particle velocity, temperature and
diameter plotted against the arc current intensity values. The predicted velocity and
temperature values, for both networks, show increasing dependence with an increase
of arc current intensity. The predicted diameter values show a similar effect except for
a slight decrease at the higher current value, which could result from particle
vaporization at higher power levels. Both these results are in conjunction to the
experimental values of the in-flight particle characteristics. Furthermore, the
improvement of the in-flight particle characteristics with an increase in power level has
been reported for different materials [15, 16, 19].
Hydrogen content in the plasma gas improves the velocity, temperature and
enthalpy of the plasma jet [158] along with the heat and momentum transfer to the
particles [159]. This improves the overall in-flight particle characteristics [160, 161].
With reference to Figure 3-13, this trend is represented by the predicted in-flight
particle characteristics by both the networks NN1 and NN2.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 79
Figure 3-12: Variations of in-flight particle characteristics with the changes in current
intensity.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 80
Figure 3-13: Variations of in-flight particle characteristics with the changes in hydrogen
plasma gas flow rate.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 81
From Figure 3-14, the predicted in-flight particle velocity increases with an
increase in the total plasma gas flow rate. The predicted particle temperature is, on the
other hand, found to drop initially and then rise rapidly. The results agree with the
experimental values. However, the results partially contradict the findings reported in
the literature [161], which indicates an increase in both the velocity and temperature
with an increase of total plasma gas flow rate. From 30 SLPM (Run 9: Table 3-1) to 40
SLPM (Run 4: Table 3-1), the hydrogen secondary plasma gas flow rate is nearly
doubled, while the argon primary plasma gas flow rate is made 0. This is directly
related to the increase of the momentum being transmitted from the plasma jet to the
particles, which leads to a decrease in the particle residence time in the plasma jet.
This could result in a drop of particle temperature. The predicted diameter values
correlate the trend presented by the experimental values. This trend, although it
correlates with the experimental values, is difficult to fully understand.
The ArV and 2HV values of 45 and 15 SLPM (Run 8: Table 3-1) were not
considered because the ArV value was greater than its highest individual limit (Table
3-2). Therefore, error would be introduced into the experimental values and the
observations drawn from this result should be considered inconclusive.
An increase in the carrier gas flow enhances particle penetration into the core of
the plasma jet [1, 62], which in turn improves the in-flight particle characteristics. The
predicted in-flight particle characteristics from both the networks NN1 and NN2 are
correlated; Figure 3-15.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 82
Figure 3-14: Variations of in-flight particle characteristics with the changes in total
plasma gas flow rate.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 83
Figure 3-15: Variations of in-flight particle characteristics with the changes in carrier
gas flow rate.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 84
Variations of injector stand-off distance and injector diameter would influence
particle penetration into the plasma jet [62]. An increase in the injector stand-off
distance, to a limiting boundary value, should improve the particle characteristics. On
the other hand, an increase in the injector diameter should lower the in-flight particle
characteristic value and act opposite to the effects of the carrier gas flow rate.
Figure 3-16 presents improvement of all the predicted values of the in-flight
particle characteristics with an increase of injector stand-off distance. This finding
correlates with the experimental values as well as that from the literature.
Figure 3-17 shows the predicted in-flight particle values, along with the
experimental values, against the change in injector diameter. The experimental and
simulation results are controversial to analyze. The experimental velocity and diameter
values indicate an increase with the injector diameter values, whereas the temperature
decreases. The predicted values from the network NN2 are in complete coherence, in
terms of values and trends, to the experimental values. However, the predicted velocity
and diameter values from the network NN1 show a similar trend represented by the
experimental values but the relative error percentage in the predicted values are high.
The predicted temperature values show an opposite trend to the experimental values
as well as a high relative error percentage.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 85
Figure 3-16: Variations of in-flight particle characteristics with the changes in injector
stand-off distance.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 86
Figure 3-17: Variations of in-flight particle characteristics with the changes in injector
diameter.
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 87
3.7 Summary
The APS is a highly variable and versatile process in terms of the input and
output relationships. The in-flight particle characteristics define and control the coating
and its structure. Accurate predictions of such parameters are important and assist
thermal spray engineers in reducing time and the complexities related to the pre-spray
tuning and parameter setting. The ANN method has been employed to study and
design the process to predict the output in-flight particle characteristics from the input
power and injection parameters. This facilitates the experimental design and data
manipulation of the APS process and helps in understanding the correlations between
the output and input parameters. The trained ANN models are sensitive to the training
data set and the validity of the output is limited to the power and injection parameters
considered in this study. The chapter further addressed the over-fitting problem in
ANNs and worked on overcoming such problems and improving the generalization
ability of the trained ANN in predicting the in-flight particle characteristics.
There was a considerable amount of scatter in the obtained database in the
experimental values of the particle velocity, temperature and diameter. However, the
predicted outputs were found to be in agreement with the experimental database from
which the networks were trained and optimized. The proposed ANN structures
successfully handled the non-linearity and versatility associated with the plasma spray
process.
The error back propagation algorithms used in this study successfully trained
and optimized the multi-layer neural network structure with the optimal number of
hidden layer neurons. The trained networks were able to correlate the effect of each
processing parameters to each of the in-flight particle characteristics. This provides the
required in-flight particle characteristics for the desired coating properties.
Database expansion using kernel regression and cross-validation and early
stopping improved the network’s generalization capability and performance. The use of
regularization in training the networks resulted in fewer use of the network parameters.
This increased the level of network parameter scattering. However, the generalization
performance greatly improved in comparison to cross-validation and early stopping.
The ANN based model, within the limits of its training data and the input
processing parameters considered, is suitable to be incorporated into an on-line
Chapter 3: Artificial Neural Network Modelling
Tanveer Ahmed Choudhury Page 88
plasma spray control system to allow the automated system achieve the desired
process stability.
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 90
Chapter 4 Network Structure Modification and Multi-Net System
This chapter starts by discussing the use of a modified artificial neural network
(ANN) structure to model the atmospheric plasma spray (APS) process for predicting
the in-flight particle characteristics from the input power and injection processing
parameters. Modification is achieved through the neural network structure optimization.
The later part of the chapter discusses the use of a multi-net ANN structure to
model the plasma spray process. Modular implementation is implemented to predict
the in-flight particle characteristics. Modular implementation allows simplification of the
optimized model structure with enhanced ability to generalise the network. It achieves
better correlations between each of the in-flight particle characteristics with the input
processing parameters.
4.1 Network Structure Modification
4.1.1 Background
In all the past studies of neural network implementation of the APS process,
[38, 162] the default multi-layer perceptron (MLP) ANN structure, with an error back-
propagation (BP) algorithm, was used to construct the network in predicting the in-flight
particle characteristics.
The MLP architecture consists of three distinct parts: the input layer, the hidden
layers and the output layer. A diagram of the designed model with the MLP structure is
presented in Figure 3-2. The input layer was fed with the power and injector process
parameters. The output layer generated the in-flight particle characteristics. The
number of hidden layers depends on the nature of the problem to be studied. Both in
the referred literature and in this study, two hidden layers were used. This proved to be
sufficient in overcoming the non-linearity and variability associated with the plasma
spray process [163].
A block diagram of the default structure is presented in Figure 4-1. This
structure is termed as ‘100’ within this study. The default structure, with two hidden
layers, consists of the input layer connected to the 1st hidden layer, which is connected
to the 2nd hidden layer. The 2nd hidden layer is then connected to the output layer.
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 91
Figure 4-1: Block diagram of the default multi-layer artificial neural network structure
‘100’.
The major technical challenge with the ‘100’ structure is optimizing the number
of neurons in the hidden layers. The number of neurons needs to be increased to
provide the network with additional parameters that enhance the optimization
computations. However, increasing the neurons has the effect of under-characterizing
the network. This creates a more complex network that leads to over-fitting.
Furthermore, the training performance curve, in most cases, does not demonstrate an
exponential decay. The training error drops down to a lower value and remains at this
level for some iteration before dropping down again.
The selection of the training parameters becomes important since, with
improper selection, the network is liable to stop at a local minimum rather than
converging to the global minimum. The initial training performance value is also found
to be high. The performance values of the trained network, on the test set, for various
combinations of neurons in the hidden layers also fluctuate along with the values of the
network parameters of the trained networks.
4.1.2 Proposed network architecture
A new network structure is proposed in this work to overcome the technical
difficulties associated with the ‘100’ ANN structure. This network is provided with
additional parameters to learn and generalize the process relationships without
increasing the number of hidden layer neurons. This is facilitated by modification of the
layer connection matrix. Additional connections were made from the input layer to the
2nd hidden layer and also to the output layer. A block diagram of the new structure is
presented in Figure 4-2. This network is referred as ‘111’.
Input Layer
Hidden Layer 1
Hidden Layer 2
Output Layer
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 92
Figure 4-2: Proposed modified artificial neural network structure ‘111’ with additional
connection from the input layer to hidden layer 2 and the output layer.
The network produces an improved generalization performance over the ‘100’
structure and a smother training performance curve with a lower initial training
performance value. The network also achieves the performance goal quicker. The
performance of the trained network, over various combinations of neurons in the
hidden layers is more stable. The fluctuations in the network parameters were also
reduced. A robust model based on ANN is, thus, developed; which successfully and
more efficiently models and generalizes the non-linearity and permutations associated
with the APS process.
Section 4.1.3 describes the database collection, expansion and organization
steps. Section 4.1.4 introduces the model training and optimization process. It also
describes the construction of additional ANNs, with a default network structure. These
additional ANNs are used to compare performance of networks trained with the new
network structure. The comparison provided a validation of the effectiveness of the
proposed structure.
Section 4.1.5 is split up into three sub-sections. The first one discusses the
results obtained for networks with the new ANN structure. The second one provides a
discussion of the simulated results of additional constructed ANNs with the default
network structure. The third sub-section compares analyses and discusses the results
of two different structures of ANNs. A summary of the work is presented in Section
4.1.6.
4.1.3 Database handling
The database, DSO (Table 3-1), from the open literature [40] is used in this
work. The database collection and pre-processing steps are elaborated in Section 3.2
of Chapter 3.
Input Layer
Hidden Layer 1
Hidden Layer 2
Output Layer
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 93
Over-fitting is a major problem for a function approximating neural network. It
reduces the generalization ability of the trained networks. Generalization indicates the
ability of the networks to interpolate the training samples intelligently and predict output
from unseen inputs. In the case of over-fitting, the network fails to respond well when
tested and simulated with an unseen data set. A small training data sample, in
comparison to the total number of network parameters, is one of the reasons for poor
generalization. The data set was, thus, expanded using standard mathematical
techniques to avoid over-fitting and improve the generalization ability of the trained
networks.
Kernel regression was used for the expansion of the dataset, DSO. Kernel
regression is a non-parametric technique in statistics to estimate the conditional
expectation of a random variable without assuming any underlying distribution to
estimate the regression function. The concept is to map an identical kernel, which is
the Gaussian kernel, local to each observation data point. The resulting data were
tabulated to form the expanded database, DSE, which was approximately nineteen
times the original one. Details of the database expansion steps are described in
Section 3.3.
Error generated by the network on the test set provides a measure of the
generalization error. The trained network’s ability to generalize the process is better
when this error is lower. Twenty per cent of DSE was selected as the test set, DSET, to
test the generalization performance of the trained networks. This test set was unseen
to the network during its training process. The remaining 80% of the expanded
database was used for network training purposes. This set is referred to as DSETP. Data
division was performed by the process of interleaving, which ensured that both DSET
and DSETP represented an overall view and statistical representation of the whole
database.
4.1.4 Network training and optimization
The study considered supervised learning based on BP algorithms. The
network size in this study is within a few hundred weights. Therefore a non-linear least
squares numerical optimization method of the Levenberg-Marquardt (LM) algorithm
(Section 2.2.2.2) was used for training the ‘111’ network. The LM algorithm is
considered more efficient in training than the conjugate gradient method or the variable
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 94
learning rate algorithm for a network with a few hundred weights [100].Other standard
back propagation algorithms are slow and require excessive off-line training. They also
suffer from temporal instability and tend to become fixed to the local minima [157].
The maximum number of training epochs was fixed to 10,000. The maximum
number for validation failure was set to one hundred. These numbers were set high to
ensure that the network was allowed sufficient time to train for the error to converge to
its global minimum. The training parameters of the LM algorithm were adjusted for
relatively slower convergence to reduce the chance for the network to miss the global
minimum. The value of scalar parameter µ (Equation 2-44) was set to a relatively large
value of 1 with the decrement and increment factors of 0.8 and 1.5, respectively. The
transfer function for all layers was set to a log-sigmoid characteristic and the error
performance function was set to mean absolute error (MAE) (Equation 3-7).
In addition to expansion of the training data, a standard statistical technique of
cross-validation and early stopping was used to combat the issue of over-fitting during
training of the neural network. The technique further divided the training dataset into
training and validation sets. The training was stopped as soon as the network’s error on
the validation sets started to rise for a specific number of epochs. The rise in validation
error indicated whether the network was being over-fitted. The network with the lowest
validation set error was returned and saved. The dataset, DSETP, available for training,
was divided by interleaving in the ratio 0.80:0.20 to obtain the training (DSETR) and
validation (DSEV) sets.
The network parameters were initialized to random values between 0 and 1
before training. The trained networks were simulated with the test set, DSET, to obtain
the generalization error and the correlation coefficient, R. The computed correlation
coefficient (R) values on the test set indicated how well the trained network’s response
to the unseen input fits the actual outputs. It provided a measure of the network’s
generalization ability. The larger the average R-value, then better was the correlation
between the predicted and actual value. Each network was trained one hundred times.
The network generating a maximum R-value on DSET was stored and saved. The
training process was repeated as the neurons were varied from 2 to 20 and 1 to 19 in
the 1st and 2nd hidden layers, respectively.
Three additional ANNs were constructed with a default ANN structure of ‘100’ to
allow a performance comparison of the networks ‘111’. The three networks considered
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 95
supervised learning and a fully connected 2 hidden layer MLP model based on an error
back propagation algorithm. The network training conditions and parameters were kept
similar to that of the network ‘111’, mentioned in earlier paragraphs. The networks were
trained one hundred times as the number of neurons in the 1st and 2nd hidden layer was
varied from 2 to 20 and 1 to 19, respectively. In each case the network generating
maximum R-value of the test set, DSET, was stored and saved. The analysis of these
three additional ANNs agreed with those for the network ‘111’.
The error back-propagation algorithm was varied for each network. The first
network used the LM algorithm and was labelled as ‘100-LM’. The second network
used Bayesian regularization (BR) algorithm (Section 2.2.2.3) and was referenced as
‘100-BR’ to be used further in this study. The last network was trained with a resilient
back-propagation algorithm (Section 2.2.2.4) and was named as ‘100-RP’. Networks
‘100-LM’ and ‘100-RP’ used cross-validation and early stopping to combat over-fitting.
Network ‘100-BR’ used regularization for the same purpose. All three networks used
DSETR for network training. ‘100-LM’ and ‘100-RP’ used an additional dataset DSEV as
validation sets. The training parameters for ‘100-LM’ were the same as those used for
network ‘111’. Networks ‘100-BR’ and ‘100-RP’ used the default network training
parameters of MATLAB (R2012a: MathWorks Inc., Natick, MA-USA).
4.1.5 Simulation result analysis and discussion
4.1.5.1 Results for new structure
This sub-section elaborates the generalization performance of the proposed
structure ‘111’. The generalization performance includes the correlation coefficient, R,
and generalization error values. Both these values were obtained on testing the trained
networks, from Section 4.1.4, with the test set DSET.
Figure 4-3 presents a bar chart comparison of R-values and generalization
errors of all the networks with structure ‘111’ having different combinations of the
number of neurons in the hidden layers. The average R-value was found to be 0.9943
with a maximum value of 0.9996 for a total of only 15 neurons in the two hidden layers
(8 and 7 neurons in the 1st and 2nd hidden layer respectively). The values of R, over all
combinations of the number of neurons in the hidden layers, fluctuated less with a low
standard deviation of 0.0101. The network required only a total number of 7 hidden
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 96
layer neurons (4 and 3 neurons in the 1st and 2nd hidden layer, respectively) to reach a
R-value of 0.9900. The average generalization error, of all the networks, was in the
order of 0.0020 with a standard deviation of 0.0023.
Figure 4-3: Generalization performances of the artificial neural networks with proposed
structure ‘111’ and various combinations of the hidden layer neurons.
Referenced as “111-M” within the text
Referenced as “111-M” within the text
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 97
The network with 8 and 7 neurons, in the 1st and 2nd hidden layers respectively,
is marked as ‘111-M’. This network was found to generate the maximum correlation (an
R-value of 0.9996) between the predicted and actual outputs, when simulated with the
test set.
The average standard deviation of all the network parameters, for all the
networks in ‘111’ having different combinations of the number of neurons in the hidden
layers, was 1.5062. The maximum and minimum parameter standard deviations were
computed to be 2.4064 and 0.9959, respectively. The values of correlation coefficients,
generalization errors and standard deviations varied within small range of values for
some of the different networks trained within this work. The results were, thus,
presented to four decimal places to represent the changes in values clearly.
4.1.5.2 Results obtained for additional networks
The generalization performance of networks ‘100-LM’, ‘100-BR’ and ‘100-RP’
are discussed in this sub-section. The R and the generalization error values were
obtained from Section 4.1.4 by testing the trained networks with the test set DSET. The
results are graphically presented in Figure 4-4 to Figure 4-6.
The average R-value of ‘100-LM’, over all combinations of the number of
neurons in the hidden layers, was 0.9870 with standard deviation of 0.0112. The
maximum R-value of 0.9998 was achieved with a total of 37 hidden layer neurons (19
and 18 in the 1st and 2nd hidden layers respectively), Figure 4-4. The corresponding
minimum generalization error was 0.0006. The average generalization error for all the
networks, Figure 4-4, was 0.0034 with a standard deviation of 0.0035. Training with the
Levenberg-Marquardt algorithm, with the ‘100’ structure, required a total of 21 neurons
in the hidden layers to reach the marked R value of 0.9900. In addition, the network
with 8 and 7 neurons in the 1st and 2nd hidden layer, respectively, generated R-value of
0.9830 and a generalization error of 0.0029. This network is referred to as ‘100-LM-M’.
The average R-values for all the networks in ‘100-BR’ and ‘100-RP’ were
0.9888 and 0.9692, respectively. The fluctuations in all the R-values were represented
by standard deviations of 0.0125 and 0.0225 for networks of ‘100-BR’ and ‘100-RP’,
respectively. The average generalization error with corresponding standard deviations
for ‘100-BR’ was 0.0025 and 0.0037, respectively. For the networks in ‘100-RP’, the
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 98
values were 0.0110 and 0.0042. Among all the networks in ‘100-BR’, the maximum R-
value of 0.9987 was achieved with a total of 33 hidden layer neurons (17 and 16 in the
1st and 2nd hidden layer respectively). The corresponding minimum generalization error
was 0.0005, Figure 4-5. For ‘100-RP’, the network, with a total of 39 hidden layer
neurons (20 and 19 neurons in the 1st and 2nd hidden layer respectively) generated the
maximum R-value of 0.9895, with a corresponding generalization error of 0.0087,
Figure 4-6. Network ‘100-BR’ required a total of 11 hidden layer neurons to reach an R-
value of 0.99. ‘100-RP’ did not reach the 0.99 R-value.
In addition, the network with 8 and 7 neurons (in the 1st and 2nd hidden layer,
respectively) generated an R-value of 0.9949 and a generalization error of 0.0012 for
‘100-BR’. In the case of ‘100-RP’, the R-value was 0.9792 with a generalization error of
0.0105. For further use in this study, these networks are named as ‘100-BR-M’ and
‘100-RP-M’.
The average standard deviations of all the network parameters for ‘100LM’,
‘100-BR’ and ‘100-RP’ were 1.6537, 6.2072 and 2.0229, respectively. All the networks
with different combinations of the number of neurons in the hidden layers were
considered.
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 99
Figure 4-4: Generalization performance of networks ‘100-LM’ with various
combinations of the hidden layer neurons.
Referenced as “100-LM-M” within the text
Referenced as “100-LM-M” within the text
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 100
Figure 4-5: Generalization performance of networks ‘100-BR’ with various
combinations of the hidden layer neurons.
Referenced as “100-BR-M” within the text
Referenced as “100-BR-M” within the text
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 101
Figure 4-6: Generalization performance of networks ‘100-RP’ with various
combinations of the hidden layer neurons.
Referenced as “100-RP-M” within the text
Referenced as “100-RP-M” within the text
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 102
4.1.5.3 Comparison of results and discussion
A comparison and discussion of the performance results of the networks ‘111’,
‘100-LM’, ‘100-BR’ and ‘100-RP’, obtained from Section 4.1.4, are presented in this
sub-section. This analysis provided an understanding of the performance of the new
proposed structure ‘111’ in comparison to the standard artificial neural network
structures.
A bar-chart of average R-values and corresponding generalization errors for the
four networks, over all combinations of the number of neurons in the hidden layers, is
presented in Figure 4-7. In comparison to the other three networks, ‘111’ exhibited the
highest R-value of 0.9943 and the lowest generalization error of 0.0020. The
generalization performance of the proposed network ‘111’ was superior to that of
networks ‘100-LM’, ‘100-BR’ and ‘100-RP’.
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 103
Figure 4-7: Average generalization performance for four different artificial neural
networks.
Figure 4-8 represents the standard deviations of the R-values and
generalization errors generated by the four networks with similar variations of the
number of hidden layer neurons. In both cases the fluctuations of the performance
parameters were lowest for the new network structure ‘111’ and demonstrates stability
in the generalization performance of the newly proposed structure over different
conditions.
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 104
Figure 4-8: Standard deviations of the generalization performances of four different
artificial neural networks.
From Section 4.1.5.1 and 4.1.5.2, the maximum R-values generated by the
networks ‘111’, ‘100-LM’, ‘100-BR’ and ‘100-RP’, over various combinations of the
number of neurons in the hidden layers, were obtained, Figure 4-9. A bar chart
comparison of the total number of hidden layer neurons, required by each of the four
networks, to obtain their respective maximum R-value is also presented alongside in
Figure 4-9. Network ‘100-LM’ achieved a slightly higher R-value of 0.9998 in
comparison to that of 0.9996 achieved by network ‘111’. ‘100-LM’, however, it required
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 105
a total of 37 hidden layer neurons to achieve the maximum R-value. This neuron
number is higher than the 15 hidden layer neurons used by network ‘111’. In
comparison to the network ‘111’, networks ‘100-BR’ and ‘100-RP’ generated lower R-
values of 0.9987 and 0.9895, respectively. They also required a higher total number of
hidden layer neurons of 33 and 39, respectively. This shows that the proposed
structure is able to generate better generalization performance with a smaller total of
hidden layer neurons.
Figure 4-9: Maximum correlation coefficient (R) values of four different artificial neural
networks along with their corresponding total number of hidden layer neurons.
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 106
Table 4-1 lists the number of network parameters used by different ANNs with
different combinations of neurons in their hidden layers. Correlating the findings above,
network ‘111’ required 8 and 7 hidden layer neurons to generate the maximum R-
value. This corresponded to 239 network parameters being used. Network ‘100-LM’
required 19 and 18 neurons in the 1st and 2nd hidden layers that corresponded to 588
network parameters. Similarly, networks ‘100-BR’ and ‘100-RP’ used 475 and 639
network parameters, respectively, during the network training. The values are typed in
bold for reference and show that the proposed network structure of ‘111’ generated
better generalization performance with a lower number of network parameters.
Table 4-1: Number of network parameters used during training of different artificial
neural networks.
Number of neurons in the 1st and 2nd hidden
layers
Number of Network Parameters
Network ‘111’
Network ‘100-LM’
Network ‘100-BR’
Network ‘100-RP’
2-1 59 27 25 27 3-2 84 44 41 44 4-3 111 63 59 63 5-4 140 84 79 84 6-5 171 107 101 107 7-6 204 132 125 132 8-7 239 159 151 159 9-8 276 188 179 188
10-9 315 219 209 219 11-10 356 252 241 252 12-11 399 287 275 287 13-12 444 324 311 324 14-13 491 363 349 363 15-14 540 404 389 404 16-15 591 447 431 447 17-16 644 492 475 492 18-17 699 539 521 539 19-18 756 588 569 588 20-19 815 639 619 639
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 107
Figure 4-10 presents a bar chart comparison of the average standard deviations
of the network parameters for all the networks in ‘111’, ‘100-LM’, ‘100-BR’ and ‘100-
RP’. The fluctuations of the parameters in ‘111’ were the lowest in comparison to the
other three networks. Network ‘100-BR’ revealed the highest fluctuation of 6.2072.
Figure 4-10: Average standard deviations of the network parameters for four different
artificial neural networks.
The network ‘111-M’ generated the maximum generalization performance
among all the networks in ‘111’ (Section 4.1.5.1). In order to further compare the
performance of ‘111’ on similar conditions, Section 4.1.5.2 marked out the
corresponding networks, with the same number of neurons in the hidden layers, as
‘100-LM-M’, ‘100-BR-M’ and ‘100-RP-M’. Figure 4-11 presents bar chart comparisons
of the generalization performances of these networks. The graphs illustrated better
generalization performance of ‘111-M’ in comparison to ‘100-LM-M’, ‘100-BR-M’ and
‘100-RP-M’. Network ‘111-M’ generated the highest R-value and the lowest
generalization error among all the networks.
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 108
Figure 4-11: Generalization performance of the four different artificial neural networks
with 8 and 7 neurons in the 1st and 2nd hidden layers.
Table 4-2 lists the number of epochs required by each of the four networks
(‘111-M’, ‘100-LM’, ‘100-BR’ and ‘100-RP’) during training to minimize the training set
error (expressed in MAE). Network ‘111-M’ required 133 epochs, which is the lowest
among all the four networks. Network ‘100-BR’ required 5,874 epochs to obtain the
lowest training set error. This confirmed the fact that, in spite of having additional
network parameters to work with, the simulation time for ‘111-M’ was shorter.
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 109
Table 4-2: Number of epochs required to minimize the artificial neural network training
error.
Networks Epochs 111-M 133 100-LM 171 100-BR 5,874 100-RP 194
A graph showing the training errors of the four networks, for the first 30 epochs,
is presented in Figure 4-12. The initial 30 epochs provided an overview of how the
network training progressed and presented a clear view of the starting point. Network
‘111-M’ revealed the lowest starting error of 0.3142 while ‘100-LM-M’ was the next
lowest with 0.7250. Networks ‘100-BR-M’ and ‘100-RP-M’ demonstrated large starting
errors of 130 and 515, respectively. In comparison to the other networks, the error
decrement curve was more monotonic for the network ‘111-M’. Within 5 epochs, ‘111-
M’ reached a low error value of 0.025148, whereas networks ‘100-LM-M’, ‘100-BR-M’
and 100-RP-M were at values of 0.2233, 2.2912 and 56.5679, respectively.
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 110
Figure 4-12: Training error responses (for the first 30 epochs (iterations)) of the four
different artificial neural networks.
Section 4.1.5 analysed and compared the training results of ANNs with the new
proposed structure ‘111’ and with the default structure ‘100’. Table 4-3 provides a
summary of the performance comparisons of ‘111’. For each performance parameter,
the best performing values are typed in bold.
In comparison to the networks with structure ‘100’, the average generalization
performance, both in terms of correlation coefficients and generalization errors, was
superior for ‘111’. The fluctuations of average R and generalization error values, over
various combinations of the number of hidden layer neurons, were the least for ‘111’.
Network ‘111’ reached higher maximum correlation coefficient (R) value in
comparison to networks ‘100-BR’ and ‘100-RP’. Network ‘100-LM’ achieved a slightly
higher R-value of 0.9998 in comparison to that of 0.9996 achieved by network ‘111’.
However, the network ‘111’ required the least number of hidden layer neurons and
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 111
network parameters to achieve its maximum R-value. The average standard deviation
for all the network parameters, with various combinations of the number of hidden layer
neurons, was also the least for network ‘111’. The minimum fluctuations of the network
parameters, along with the generalization performance parameters, indicated stability
and robustness of the trained networks.
The generalization performances of ‘111-M’ were better than the corresponding
selected networks ‘100-LM-M’, ‘100-BR-M’ and ‘100-RP-M’. Furthermore, the training
of ‘111-M’ required fewer epochs to achieve the training performance goals with
smaller initial training error. The training error response curve for the network ‘111-M’
was also smoother in comparison to other selected networks with the ‘100’ structure
(Figure 4-12).
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 112
Table 4-3: Performance comparison summary of the proposed structure ‘111’ with the
default artificial neural network structure ‘100’. Note: “MAE” refers to mean absolute
error and for each performance parameter, the best performing values are typed in
bold.
Network
Average correlation coefficient (R)
Average generalization error (MAE)
Value Standard deviation Value Standard
deviation
111 0.9943 0.0101 0.0020 0.0023
100-LM 0.9870 0.0112 0.0034 0.0035
100-BR 0.9888 0.0125 0.0025 0.0037
100-RP 0.9692 0.0225 0.0110 0.0042
Network Maximum correlation
coefficient (R)
Total number of hidden
layer neurons
Number of network
parameters
Average standard
deviation of network
parameters
111 0.9996 15 239 1.5062
100-LM 0.9998 37 588 1.6537
100-BR 0.9987 33 475 6.2073
100-RP 0.9895 39 639 2.0229
Network Correlation coefficient (R)
Generalization error (MAE)
Epochs (Iterations)
111-M 0.9996 0.0009 133
100-LM-M 0.9830 0.0029 171
100-BR-M 0.9949 0.0012 5874
100-RP-M 0.9792 0.0105 194
4.1.6 Summary
ANN was employed to predict the output of in-flight particle characteristics of
the APS process from the power and injection parameters. The typical ANN two hidden
layer MLP structure handled the versatility and non-linearity associated with the APS
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 113
process. However, there are some technical challenges with the existing structure in
terms of optimizing the number of hidden layer neurons.
A new and optimized ANN structure was, thus, proposed. The proposed
structure was a modified two hidden layer MLP architecture with additional network
parameters for the network to learn and generalize the process relationships without
increasing the number of hidden layer neurons. This was facilitated by modification of
the layer connection matrix. Additional connections were made from the input layer to
the 2nd hidden layer and also to the output layer.
The simulation results and analysis illustrated that the network with proposed
structure were successful in modelling the APS process to predict the in-flight particle
characteristics from the input processing parameters. The networks also achieved the
following research objectives; (i) improve the training performance, (ii) regularize the
training curve to monotonically move towards the global minimum, and (iii) decrease
the levels of fluctuation of the training performance curve.
4.2 Multi-Net System and Modular Combination
4.2.1 Background
This section focuses on simplification of the designed ANN model in predicting
the in-flight particle characteristics from the input processing parameters of an APS
process. The study also aims in improving the generalization ability of ANNs. The
above objective is achieved by implementation of the modular combination of the ANNs
to model the APS process.
Modular approaches are used for improving the performance of a task. The task
can be accomplished with a monolithic network; however, breaking down the tasks into
a number of specialist modules provides better performance. Modular implementation
allows simplification of the optimized model structure with enhanced ability to
generalise the network. It is found to obtain better correlations between each of the in-
flight particle characteristics with the input processing parameters.
In the modular approach, APS is first decomposed into sub-processes to
simplify the model structure. Each sub-process is a part of the whole APS process and
is assigned a different ANN. Thus, each designed ANN focuses on solving only a sub-
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 114
process. The final solution could be obtained by re-combining the individual network
solutions.
Decomposition of the task allowed simpler ANNs to be built and at the same
time helps the networks to learn the process more efficiently. The segmented approach
allows the user to understand the relationships that the model established between
each of the in-flight particle characteristics and the input processing parameters. The
generalization ability of the overall ANN model improved. Furthermore, system
reliability is enhanced by splitting up the problem so that each network is trained to
solve a part of the whole problem. Any fault or error in prediction of one of the sub-
problems does not affect the entire solution to the problem. The predicted output
fluctuations are also greatly reduced, resulting in a more robust and reliable network.
Figure 4-13 represents a flowchart outlining the overall research methodology in
this section.
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 115
Figure 4-13: Research methodology for modular implementation of artificial neural
network in modelling the atmospheric plasma spray process.
Section 4.2.2 provides a description of the ANN architecture used in this work.
Section 4.2.3 illustrates the database handling steps. Network training and optimization
processes are presented in Section 4.2.4 followed by the description of construction of
additional networks. The additional constructed networks are used for comparison of
the performance of modular ANNs. Section 4.2.6 is split up into three sub-sections. The
first one discusses the simulation results of modular ANN. The second section provides
a discussion of the simulation results of additional constructed ANNs. The third and last
section provides a comparison between these two types of ANNs. A brief summary of
the work is presented in Section 4.2.7.
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 116
4.2.2 Modular Combination
In modular combination, decomposition of the task into modular components
can be achieved automatically, explicitly or by means of class decomposition (Section
2.3.2.1). With the existing knowledge and understanding of the APS process, explicit
decomposition was chosen. The overall task in this work concentrated on predicting the
three in-flight particle characteristics (i.e., in-flight particle velocity, temperature and
diameter) from the input processing parameters of the APS process. The task was
decomposed into three sub-tasks, each considering the effects of input processing
parameters on one of the in-flight particle characteristics. Each of the sub-tasks was
then assigned a different ANN.
There are at least four different modes of combining component nets, namely;
co-operative, competitive, sequential and supervisory (Section 2.3.2.2). Generally
ensemble combination uses co-operative combination, while the modular combination
uses competitive, sequential or supervisory combination. However, in this study, all the
three output parameters were of equal importance. Co-operative combination was,
thus, used in this study. The three outputs from three networks, each providing a
solution to the sub-task assigned, were combined with equal weighting to generate the
final solution. The co-operative combination flowchart in Figure 2-11 was, thus,
updated. The updated flowchart is provided in Figure 4-14. The task solutions were
replaced by sub-task solutions, which were combined to obtain the final task solution.
Figure 4-14: An updated co-operative combination of artificial neural network modular
components.
Figure 4-15 provides a flowchart for the modular implementation of the APS
process. All the APS input processing parameters (power and injection parameters)
were fed into the input layer of the networks. The first network (NET1) generated the in-
flight particle velocity at the output layer, the second network (NET2) generated the in-
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 117
flight particle temperature and the third network (NET3) generated the in-flight particle
diameter as the output.
Figure 4-15: Flowchart for modular artificial neural network implementation of the
atmospheric plasma spray process.
The networks NET1, NET2 and NET3 were based on a fully connected MLP
model with supervised error back propagation algorithms. Figure 4-16 provides the
MLP architecture, consisting of three distinct parts: the input layer, the hidden layers
and the output layer. A single hidden layer was used in this study. It proved sufficient
for the networks to learn the function defining the sub-tasks assigned.
In Figure 4-16, jiw (where i = 1…8 and j = 1…N1) represents the input layer
weights. jiβ (where i = 1… N1 and j = 1) the output layer weights. N1 represents the
number of linear nodes or neurons in hidden layer. No specific rule exists to define the
optimum number of neurons in the hidden layers. The number depends on the nature
of the problem that the network is encountering and the network optimization process.
The large number of neurons in the hidden layer provides network flexibility to optimize
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 118
many parameters and reach an improved solution. However, there was a limit beyond
which the network became under-characterized because it was forced to handle more
parameters than the available data set. The optimum number of hidden layer neurons
was established in this study by network training and optimization techniques.
Figure 4-16: Single hidden layer multi-layer artificial neural network architecture.
4.2.3 Database processing
The database, DSO (Table 3-1), was split into three sub-sets based on the three
output parameters. Figure 4-17 provides a flow chart of the data split process. The first
subset, DSO1, contained the input processing parameters and the average in-flight
particle velocity. The second subset, DSO2, contained the input processing parameters
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 119
and the average in-flight particle temperature and the third and last subset, DSO3,
contained the input processing parameters and the average in-flight particle diameter.
All the data sets were linearly transformed using Equation 3-1 to ensure equal
treatment from the networks during the training and prevented any calculation errors
related to different parameter magnitudes.
Figure 4-17: Data split process for modular implementation of artificial neural networks
in modelling the atmospherics plasma spray process.
The datasets DSO1, DSO2 and DSO3 were each divided, by interleaving, in the
ratio 0.85:0.15 to form the training and test sets. The training sets were used to train
the neural networks; i.e., to optimize the network parameters in learning the underlying
input-output relationships. The computed correlation coefficient (R) values along with
the error, generated by the network on the test set, provided a measure of the
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 120
generalization ability of the ANNs; i.e., the ability of the trained network to respond well
to unseen data. The larger the average R-value then better is the correlation between
the predicted and actual value, which indicated better network performance.
Interleaving ensured that the test data set points represented an overall view and
statistical representation of the whole database.
4.2.4 Network training and optimization
Correct optimization of the weight matrix was essential for the network to learn
the desired complex input-output relationships. Optimization was achieved by a training
procedure, which taught the network to generalize input and output relationships from
the training set. A Bayesian regularization algorithm (Section 2.2.2.3) was used to train
the networks NET1, NET2 and NET3. The algorithm works within the framework of
Levenberg-Marquardt algorithm by modifying the typical performance function used in
feed-forward neural network training. The term regularization refers to the method of
improving generalization by constraining the size of the network weights.
The algorithm employs regularization to combat the problem of over-fitting. The
algorithm, thus, uses the whole available dataset for training purposes without any
need for a separate validation set. This method prevented data from being discarded.
The algorithm particularly suites cases, such as the one considered in this study, where
there is a relatively small dataset available for network training. Furthermore, the
Bayesian regularization preserved an optimal network size and reduced the pre-
training work required to determine the minimum network size to avoid over-fitting.
The maximum number of training epochs was fixed to 300. In this way the
networks had sufficient time and iterations to converge to the global error minimum.
The transfer function in all layers for all three networks was set to a tan-sigmoid. The
subsets NET1, NET2 and NET3 were provided with separate initial parameters to start
the training process. The initial network parameters, for each network, were initialized
separately with random values between 0 and 1. This procedure allowed each network
to independently map the relationship between the APS input process parameters and
each of the output in-flight particle characteristics.
The three networks were initialized with two neurons in the hidden layer.
Training was repeated one hundred times and the trained networks were simulated
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 121
with the test set at each time. The networks generating the maximum R-value on the
test set were stored and saved. The training process was repeated as the number of
neurons was incremented by one. The maximum number of neurons used in the
hidden layer was 30. The variations of the overall performance of the networks, over
neuron number 30, were found to be insignificant.
4.2.5 Construction of additional networks
In order to compare the performance of the modular ANNs, three separate and
different traditional artificial neural networks were constructed, trained and tested to
obtain their performance results. The term ‘traditional’ indicates that the architecture
from the prior literature was followed during the construction of these networks. The
performance features of these networks were compared with those obtained from a
modular ANN implementation of the APS process. Therefore, an enhanced
understanding of the advantages and disadvantages of the designed modular ANN
was possible.
A diagram of the designed model with MLP structure is presented in Figure 3-2.
The input layer was fed with the power and injector process parameters. The output
layer generated the in-flight particle characteristics. The past studies of ANN
implementation of APS [14, 38, 40] were carried out with 2 hidden layers. All three
additional networks for comparison were, thus, based on 2 hidden layers and a feed-
forward MLP structure with error back propagation as the training algorithm. The first
network was trained with a Bayesian regularization algorithm (Section 2.2.2.3) and was
named as COMP1. The second and third networks were trained with Levenberg-
Marquardt (Section 2.2.2.2) and resilient back-propagation (Section 2.2.2.4) algorithms
respectively and were labelled as COMP2 and COMP3. COMP1 used a regularization
technique to combat the problem of over-fitting while COMP2 and COMP3 employed
cross-validation and early stopping.
The whole database DSO (from Table 3-1) was used for network training and
testing of the three additional ANNs. In this case, DSO was divided by interleaving in
the ratio of 0.85 and 0.15 to form the training dataset DSOTR and test dataset DSOT.
DSOTR was only used for network training purposes. The test set, DSOT, was used to
measure and test the generalization performance of the trained networks. Since the
trained networks did not see these sets of input-output vectors, the network
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 122
performance on the test set provided a good measure of generalization ability of the
networks. COMP1 used the whole DSOTR for training purposes as the regularization
technique did not require an additional validation set for network training. COMP2 and
COMP3, which used cross-validation and early stopping, required a separate validation
set along with the training set. For COMP2 and COMP3, the training dataset DSOTR
was, therefore, divided further by interleaving in the ratio of 0.85:0.15 to form the final
training set and the validation set.
For each of the networks, the training was initiated with 2 and 1 neurons in the
first and second hidden layers respectively. The initial weights and biases were set to
random values between 0 and 1. The maximum number of epochs was set to 300 and
all the other training parameters were set to default values generated by MATLAB. The
transfer function in all layers was set to tan-sigmoid. The training was repeated 100
times and each time the trained network was simulated with the test set. The network
generating the maximum correlation coefficient, R, was saved. The number of neurons
in the hidden layer was increased by one each and the training procedure was
repeated. A maximum of 30 and 29 neurons in the first and second hidden layer were
tested.
4.2.6 Simulation result analysis, comparison and discussion
4.2.6.1 Results for modular neural networks
Figure 4-18 provides a bar chart comparison of the correlation coefficient (R)
and generalization error values for NET1 trained with different number of neurons in
the hidden layer. NET1 generated the highest R-value of 0.8665 for 23 neurons in the
hidden layer. The corresponding generalization error was 0.2695. It was observed that
as the number of neurons in the hidden layer was increased, the variations of
generalization performance of the networks were decreased considerably. The average
R-value, over all values of the number of hidden layer, was 0.8378 with a standard
deviation of 0.0282. The corresponding average generalization error was 0.2701 with a
standard deviation of 0.0008. The fluctuations in the performance parameters were
found to be the highest in comparison to the corresponding performance parameter
values of NET2 and NET3; which are discussed in the following paragraphs.
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 123
Figure 4-18: Generalization performance of NET1 over various number of hidden layer
neurons.
Figure 4-19 provides a bar chart comparison of the generalization performances
of NET2 trained with a different number of neurons in the hidden layer. The trend in the
variation of the correlation coefficients and generalization error with the variations of
the number of hidden layer neurons was, to some extent, in agreement with each
other. The values were stable, except for a few fluctuations, over various numbers of
neurons in the hidden layer. This stability was depicted by the small standard deviation
Referenced as “NET1-M” within the text
Referenced as “NET1-M” within the text
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 124
of 0.0006 computed for all the R and generalization error values obtained over various
neuron numbers in the hidden layer. This trend was unlike the generalization
performance response of NET1 shown in Figure 4-18, where the response was
sensitive to the number of neurons in the hidden layer.
In comparison to NET1, NET2 was found to generalize the relationship between
particle temperature, T, and the input processing parameters to a greater extent. The
average R value for NET2 was 0.9982, which was greater than the corresponding
average R-value of 0.8378 for NET1. The average generalization error for NET2 also
took a lower value of 0.0027, in comparison to that of 0.2701 for NET1.
For NET2, the network with 3 hidden layer neurons generated the best
performance, in terms of R-value, over all the neuron number. The maximum R-value
generated was 0.9999 with a corresponding generalization error of 0.0029.
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 125
Figure 4-19: Generalization performance of NET2 over various number of hidden layer
neurons.
Figure 4-20 provides a bar chart comparison of the correlation coefficient (R)
and generalization error values for NET3 trained with a different number of neurons in
the hidden layer. The trend was opposite to that of NET1 (Figure 4-18) since the
generalization performance deteriorated gradually with the increase in the number of
neurons in the hidden layer. The network, with just 2 neurons in the hidden layer,
generated the best generalization performance with a maximum R value of 0.9896 and
Referenced as “NET2-M” within the text
Referenced as “NET2-M” within the text
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 126
minimum generalization error of 0.0586. NET3 was also found to generalize the
relationship between the input processing parameters and the in-flight particle diameter
to a large extent. The average R-value and generalization error, over all the networks
trained, was 0.9895 and 0.0599, respectively. These values were higher in comparison
to the ones for NET1; however, they were slightly lower than that of NET2. The
performance parameter values of R and generalization error fluctuated the least in
comparison to NET1 and NET2 with values of 0.0001 and 0.0003, respectively.
Figure 4-20: Generalization performance of NET3 over various number of hidden layer
neurons.
Referenced as “NET3-M” within the text
Referenced as “NET3-M” within the text
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 127
4.2.6.2 Results obtained for additional networks
Figure 4-21 presents the generalization performances of COMP1 with various
combinations of the number of neurons in the hidden layers. The average R-value and
generalization error value for COMP1 were 0.3981 and 0.1165, respectively, with
corresponding standard deviations of 0.0324 and 0.0263. The network with 3 and 2
neurons in the 1st and 2nd hidden layer, respectively, generated the highest R value of
0.5309 and corresponding minimum generalization error of 0.0690.
The average R-value for all the networks in COMP2 was 0.7431 with a
generalization error of 0.0612. In comparison to all the networks trained in COMP2, the
network with 19 and 18 neurons, in the 1st and 2nd hidden layer, generated the best
generalization performance with a maximum R value of 0.9179 with a corresponding
generalization error of 0.0576. The Levenberg-Marquardt algorithm worked better than
the Bayesian regularization in generalizing the overall relationship of all the in-flight
particle characteristics with the input processing parameters. However, variations of the
performance parameters, over different combinations of the number of neurons in the
hidden layers, did not follow any specific trend. All the values fluctuated from one
network to the other and the standard deviation of the predicted R-values was 0.0926.
Figure 4-22 compares the R-values and generalization errors generated by the
networks, with different combinations of neurons in the hidden layers.
Figure 4-23 presents the generalization performance of COMP3 over similar
variations of the number of neurons in the hidden layers. The overall generalization
performance was reduced in comparison to COMP2; however, the resilient back
propagation algorithm was better than the counterpart Bayesian regularization
algorithm in generalizing the over-all input-output relationship. The COMP3 required 28
and 27 neurons, in the 1st and 2nd hidden layer, respectively, to achieve the highest R-
value of 0.8303 and generalization error of 0.0495. The average R-value and
generalization error was 0.6697 and 0.0637, respectively. In this case too, the network
response to the test set fluctuated over the combinations of the number of neurons in
the hidden layers. The fluctuations in R-values increased to a greater extent, in
comparison to that of COMP1 and COMP2, with a relatively high standard deviation of
0.1217.
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 128
Figure 4-21: Generalization performance of COMP1 over various combinations of the
hidden layer neurons.
Referenced as “COMP1-M” within the text
Referenced as “COMP1-M” within the text
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 129
Figure 4-22: Generalization performance of COMP2 over various combinations of the
hidden layer neurons.
Referenced as “COMP2-M” within the text
Referenced as “COMP2-M” within the text
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 130
Figure 4-23: Generalization performance of COMP3 over various combinations of the
hidden layer neurons.
Referenced as “COMP3-M” within the text
Referenced as “COMP3-M” within the text
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 131
4.2.6.3 Result comparison and analysis
From Section 4.2.6.1, it was observed that the training responses of NET1,
NET2 and NET3 were stable and the network responses to the test set, over various
numbers of hidden layer neurons, followed a trend. This was unlike that of COMP1,
COMP2 and COMP3 in Section 4.2.6.2. It can therefore be stated that the training
outputs of the modular ANNs were more robust and stable.
For comparison of the results of modular ANNs, the results obtained for
general ANNs in Section 4.2.6.2 were split up to obtain values of each of the in-flight
particle characteristics separately. Separate correlation coefficients, R values, were
computed for in-flight particle velocity, temperature and diameter for each network and
each combination of the number of neurons in the hidden layers. This method, firstly,
provided an awareness of how well the networks COMP1, COMP2 and COMP3, when
learning together all the output parameters, could generalize the relationship between
each of the output in-flight particle characteristics with the input processing parameters.
Secondly, this result could be easily compared with those provided by the modular
ANNs, which had three separate networks with each learning one of the output particle
characteristics only.
For COMP1, the average R-values of only the predicted in-flight particle
velocity, by all the networks having different combinations of the number of neurons in
the hidden layers, was found to be 0.8607. COMP2 and COMP3 exhibited lower
average R-values of 0.5030 and 0.4887, respectively. Comparing these values to that
of NET1, only the performance of COMP1 was found to be slightly higher than that of
NET1. The values generated by COMP2 and COMP3 were much lower to those
generated by NET1. Figure 4-24 compares these values.
The fluctuations in R-values of only the predicted in-flight particle velocity
increased as the focus shifted from COMP1 to COMP2 and finally to COMP3. The
trend agreed with the values of standard deviations obtained in Section 4.2.6.2, where
the networks predicted the combined output parameters. COMP3 was found to
generate the most fluctuating results and COMP1 the lowest. NET1, on the other hand,
predicted the in-flight particle velocity with greater stability over different numbers of
neurons in the hidden layer; as indicated by its lowest standard deviation. Table 4-4
presents the standard deviations of all the R-values generated by all four networks.
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 132
In predicting the individual in-flight particle temperature and diameter, the
modular networks NET2 and NET3 outperformed COMP1, COMP2 and COMP3. The
average R-values in predicting the particle temperature and diameter, over a different
number of neurons in the hidden layer, were 0.9984 and 0.9895, respectively, for NET2
and NET3. These values were much higher than those of COMP1, COMP2 and
COMP3, which were 0.8254, 0.6456 and 0.2444, respectively, for predicting particle
temperature and values of 0.9265, 0.9328 and 0.9419, respectively, for predicting the
particle diameter. Figure 4-24 provides a bar chart for clarity.
Table 4-4 tabulates the standard deviations of R values for the networks in
predicting the particle velocity, temperature and diameter. The modular network NET1
generated the in-flight particle velocity as output only. The network NET2 and NET3
generated in-flight particle temperature and diameter, respectively. The first row of
Table 4-4 represents the standard deviations of R-values for the networks in predicting
the in-flight particle velocity only. In this case, the values of standard deviations for
NET2 and NET3 are not applicable and are represented by ‘-’. For the standard
deviations of R-values of the network predicting in-flight particle temperature, the
values of NET1 and NET3 are not applicable and are represented by ‘-’. Similarly for in-
flight particle diameter, NET1 and NET2 are represented by ‘-’.
The modular networks NET1, NET2 and NET3 generated a stable correlation
coefficient values in comparison to COMP1, COMP2 and COMP3. In predicting the in-
flight particle velocity, temperature and diameter, the modular networks generated the
lowest standard deviations.
In predicting the in-flight particle temperature, the network NET2 generated the
highest R-value in comparison to COMP1, COMP2 and COMP3 (Figure 4-24). The
fluctuations in R-values, however, increased with the drop in the network performance
(Table 4-4). NET2 generated the lowest standard deviation of 0.0006 among all the
networks.
The standard deviation of R-value of NET3, in predicting the in-flight particle
diameter, was the least in comparison to COMP1, COMP2 and COMP3. For COMP1,
COMP2 and COMP3, the fluctuations of R-values increased with the rise of R-value.
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 133
Table 4-4: Standard deviations of correlation coefficient (R) for the modular and
general artificial neural networks.
NET1 NET2 NET3 COMP1 COMP2 COMP3
Standard Deviation
of Correlation Coefficient
(R)
In-flight particle velocity
0.0282 - - 0.0522 0.6456 0.7311
In-flight particle
temperature - 0.0006 - 0.0255 0.5708 0.8246
In-flight particle
diameter - - 0.0001 0.0142 0.0725 0.0886
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 134
Figure 4-24: Performance comparison of modular networks with general artificial neural
networks in predicting the individual in-flight particle characteristics.
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 135
As found in Section 4.2.6.2, in COMP1, the network with 3 and 2 neurons in the
1st and 2nd hidden layers (a total of 5 hidden layer neurons) generated the best
performance on the test set with a R-value of 0.5309. For further use, this network was
marked as COMP1-M. For COMP2, the network with 19 and 18 neurons in the 1st and
2nd hidden layers (a total of 37 hidden layer neurons) provided the maximum R value of
0.9179. This network was saved as COMP2-M. In COMP3, the network with 28 and 27
neurons in the 1st and 2nd hidden layers (a total of 55 hidden layer neurons) generated
the maximum generalization performance of R value of 0.8303. For referencing, this
network was referred to as COMP3-M.
From Section 4.2.6.1, NET1 achieved the highest R-value of 0.8665, for
predicting the in-flight particle velocity, with 23 hidden layer neurons. This network was
named as NET1-M. NET2 predicted the average in-flight particle temperature with a
maximum R-value of 0.9999 with only 3 hidden layer neurons. For further use, this
network was named as NET2-M. NET3 required 2 hidden layer neurons to achieve the
highest R value of 0.9896 in predicting the in-flight particle diameter. For further
referencing, this network was named as NET3-M.
Using the Figure 4-14 structure, the outputs of NET1-M, NET2-M and NET3-M
were combined to generate the final model outputs, labelled as NET-C. NET-C
generated the R-value of 0.8317. Figure 4-25 provides a bar chart of the R-value
comparisons of NET-C with selected general ANNs. The modular ANN NET-C
outperformed COMP1-M and COMP3-M in terms of the R-values. COMP1-M and
COMP3-M generated R-values of 0.5309 and 0.8303, respectively. COMP2-M
generated the maximum R-value of 0.9179 among all the four networks.
NET1-M, NET2-M and NET3-M required a total number of 23, 3 and 2 hidden
layer neurons, respectively. The combined network output, NET-C, thus, required a
total of 28 hidden layer neurons. This number was lower in comparison to that of
networks COMP2-M and COMP3-M, which required a total of 37 and 55 hidden layer
neurons, respectively. The COMP1-M, on the other hand, required the lowest hidden
layer neurons of only 5. Figure 4-25 provides a bar chart comparison of the total
number of hidden layer neurons required by each of the four networks to generate the
R-values stated above.
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 136
Figure 4-25: Correlation coefficient (R) and total number of hidden layer neurons
comparison of the combined modular network output model, NET-C, with general
artificial neural network.
Table 4-5 provides a table of the network parameter statistics for different
selected networks. The networks chosen generated the best generalization
performance on their respective test sets. The second column in the table presents the
number of effective network parameters used by each of the networks to achieve the
generalization performance while the third column presents the number of parameters
available for the network to use. NET1-M, NET2-M, NET3-M and COMP1-M used the
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 137
Bayesian regularization algorithm for network training while COMP2-M and COMP3-M
used the Levenberg-Marquardt algorithm and resilient back-propagation algorithm
respectively.
With reference to the discussion in Section 2.2.2.3, the selection of objective
function parameters α and β constrained the number of parameters required by the
network to achieve the best generalization performance. As a result it was found, for
NET1-M, NET2-M, NET3-M and COMP1-M, that the number of effective network
parameters required to obtain the generalization performance was much less in
comparison to the number of parameters available. For COMP2-M and COMP3-M, the
number of effective parameters was the same as that of the number of total parameters
available. This arises because the Levenberg-Marquardt and resilient back propagation
algorithms do not use the regularization technique.
The modular neural networks NET1-M, NET2-M and NET3-M required a lesser
number of network parameters, in comparison to COMP1-M, COMP2-M and COMP3-
M, to achieve their best generalization performance. The standard deviations of the
network parameters, for the modular networks, were also lower and bounded within a
smaller range. Smaller networks with less fluctuating network parameters indicated
robustness of the trained networks.
Table 4-5: Network parameter statistics for different networks.
Networks Number of effective
parameters used
Number of parameters
available
Range of Values Standard deviation Minimum Maximum
NET1-M 10 231 -0.1874 0.2053 0.0705 NET2-M 11 31 -0.8161 1.8848 0.5805 NET3-M 9 21 -0.5416 1.3702 0.4460
COMP1-M 24 44 -1.6758 2.6102 0.8442 COMP2-M 588 588 -4.9520 4.6031 1.0654 COMP3-M 1,119 1,119 -5.7411 5.0869 0.9385
The network COMP2-M was chosen alongside the modular networks NET1-M,
NET2-M and NET3-M to simulate the original database DSO in Table 3-1. The choice of
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 138
the network COMP2-M was based on the fact that the selected network generated a
maximum R-value of 0.9179 among all the four networks in Figure 4-25. The
regression analysis was performed to obtain the value of correlation coefficient, R-
values. This provided further insight on the generalization performance of the modular
ANNs in comparison to a general ANN.
Table 4-6 provides the correlation coefficient (R) values of all the networks on
the original dataset DSO. NET1-M generated R-value of 0.6049 in predicting the in-
flight particle velocity only. For NET1-M, the R-values in predicting the combined in-
flight particle characteristics and the individual in-flight particle temperature and
diameter are not applicable. The spaces are, thus, represented by ‘-’.
The modular ANNs were found to have learned the relationships between input
processing parameters and the output in-flight particle temperature and diameter better
in comparison to the relationship of the input processing parameters with the output in-
flight particle velocity. This is represented by higher R-values computed for NET2-M
and NET3-M in predicting the in-flight particle temperature and diameter, respectively.
The R-values generated by NET2-M and NET3-M was 0.9988 and 0.9859,
respectively. The combined output, NET-C, of the three modular networks generated
an R-value of 0.7916. Similar to that for NET1-M, the R-values for NET2-M, NET3-M
and NET-C, which are not applicable, are represented by ‘-’ in Table 4-6.
In spite of outperforming the modular ANNs previously on the test set, the
network COMP2-M performed poorly on the original dataset, DSO. The predicted
velocity, temperature and diameter values by the modular networks, NET1-M, NET2-M
and NET3-M, demonstrate better coherence and correlation with the experimental
values that that of network COMP2-M. Both the combined outputs and the predicted
particle characteristics generated much lower correlation coefficients in comparison to
that of modular ANNs; Table 4-6. The network COMP2-M failed to correctly learn the
correlation between each of the input processing parameters on the output in-flight
particle characteristics, which resulted in a poor generalization result.
The correlation coefficient comparisons represent the overall performance of
the networks. However, further analysis is performed, as below, to view the
generalization performance of both modular and general ANNs in predicting each of
the three output parameters.
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 139
Table 4-6: Correlation coefficient (R) value comparisons of the selected networks.
Correlation Coefficient (R)
In-flight particle velocity
In-flight particle
temperature
In-flight particle
diameter Combined
NET1-M 0.6049 - - - NET2-M - 0.9988 - - NET3-M - - 0.9859 - NET-C - - - 0.7916
COMP2-M 0.4586 0.2452 0.7023 0.4482
The predicted output in-flight particle characteristics from both the modular
networks (NET1-M, NET2-M and NET3-M) and general ANN COMP2-M were
compared with their respective experimental values and the absolute value of the
relative error percentage, with respect to the experimental value, was calculated;
Table 4-7. The absolute average relative error percentages for in-flight particle velocity,
temperature and diameter, generated by the modular networks, are 11.37%, 0.31%
and 1.51%, respectively. For COMP2-M, the values are 10.38%, 8.76% and 26.75%,
respectively. These values are highlighted as bold numbers at the end of Table 4-7.
The performance of NET2-M and NET3-M was much better than COMP2-M in
correlating the output in-flight particle temperature and diameter with each of the
individual input processing parameters. This was depicted by much lower values of
average relative error percentage. The absolute average relative error percentage for
COMP2-M was slightly better than NET1-M in predicting the in-flight particle
characteristics. However, in comparison to COMP2-M, the individual predicted values
of particle velocity showed less scattering for NET1-M. This is represented by the
higher R-value of 0.6049 generated by NET1-M in predicting the in-flight particle
velocity; Table 4-6.
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 140
Table 4-7: The predicted values and absolute relative error percentages for both
modular and the general artificial neural networks.
Run Modular Networks COMP2-M
NET1-M V [m/s]
NET2-M T [°C]
NET3-M D [μm] V [m/s] T [°C] D [μm]
1 Predicted Value 235.86 2,266.83 43.61 220.18 2,149.15 23.38 Relative Error% 2.54 0.21 1.42 9.02 4.99 45.64
2 Predicted Value 240.42 2,405.41 49.83 258.50 2,259.24 40.96 Relative Error% 10.96 0.27 2.29 4.26 5.83 19.69
3 Predicted Value 245.42 2,441.11 52.17 266.43 2,256.97 48.29 Relative Error% 11.72 0.54 4.34 4.16 7.04 3.42
4 Predicted Value 218.34 1,678.37 30.11 227.84 1,937.00 15.00 Relative Error% 6.51 0.20 0.37 11.14 15.64 50.00
5 Predicted Value 225.41 2,163.55 38.20 225.53 2,080.62 17.96 Relative Error% 6.47 0.30 0.53 6.42 4.12 52.75
6 Predicted Value 231.88 2,346.62 44.56 237.33 2,205.51 25.16 Relative Error% 10.82 0.19 0.98 8.72 6.19 44.09
7 Predicted Value 234.88 2,379.97 46.83 245.72 2,238.72 31.14 Relative Error% 11.03 0.29 0.36 6.92 5.66 33.75
8 Predicted Value 244.31 2,388.57 49.64 262.78 2,297.79 38.63 Relative Error% 38.81 0.60 2.66 49.31 4.38 24.26
9 Predicted Value 218.53 2,442.79 48.68 206.78 1,781.91 46.95 Relative Error% 22.08 0.54 0.66 15.52 27.45 4.18
10 Predicted Value 236.95 2,410.18 49.34 250.11 2,215.97 40.06 Relative Error% 9.90 0.72 1.31 4.90 7.40 19.89
11 Predicted Value 240.17 2,353.54 48.21 231.12 2,182.94 35.29 Relative Error% 4.69 0.07 0.45 8.29 7.19 26.49
12 Predicted Value 240.72 2,447.63 51.12 277.41 2,384.86 50.35 Relative Error% 13.10 0.31 5.33 0.15 2.26 6.75
13 Predicted Value 251.19 2,433.87 47.16 246.96 2,393.63 36.98 Relative Error% 6.97 0.01 0.33 8.53 1.66 21.32
14 Predicted Value 255.55 2,450.40 51.62 265.16 2,335.82 43.70 Relative Error% 8.08 0.02 0.73 4.62 4.70 15.97
15 Predicted Value 243.70 2,482.14 52.75 214.54 1,707.42 29.80 Relative Error% 8.04 0.63 2.31 19.04 31.65 44.82
16 Predicted Value 249.71 2,363.36 43.06 263.95 2,267.94 49.41 Relative Error% 10.18 0.02 0.14 5.06 4.02 14.92 Average
Relative Error % 11.37 0.31 1.51 10.38 8.76 26.75
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 141
This paragraph presents a brief discussion on the modular and standard ANNs
performance in correlating the in-flight particle characteristics with changes in each of
the input processing parameters. The absolute average relative error percentages of
the predicted in-flight particle characteristics, for each of the input processing
parameters, are presented in Table 4-8. The better performing networks, for each case,
are highlighted in bold. Supporting the findings in Table 4-7, the modular networks
NET2-M and NET3-M were found to be the better performing network in predicting the
in-flight particle temperature and diameter. Apart from predicting the in-flight particle
velocity from the variations of the injector diameter, COMP2-M performed better in
predicting the in-flight particle velocity from the remaining input processing parameters.
The result correlates with the previous discussion on network performance in Table
4-7.
Table 4-8: Absolute average relative error percentage of the predicted average in-flight
particle characteristics with the variations of each input processing parameters.
Input processing parameters
Absolute average relative error percentage (%) *
In-flight particle Velocity, V
In-flight particle Temperature, T
In-flight particle Diameter, D
NET1-M COMP2-M NET2-M COMP2-M NET3-M COMP2-M
Current intensity 8.40 5.81 0.34 5.95 2.69 22.92
Hydrogen content 8.71 8.30 0.24 7.90 0.56 45.15
Total plasma gas flow rate 23.60 23.24 0.62 13.07 1.54 16.11
Argon carrier gas flow rate 8.90 4.22 0.19 4.72 2.89 16.62
Injector stand-off distance 7.52 6.58 0.01 3.18 0.53 18.65
Injector diameter 9.11 12.05 0.32 17.84 1.22 29.87
* Absolute average relative error percentage of the predicted values with respect to the experimental values. * The bold values indicate the better performing network.
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 142
The better performance of the modular ANNs on the original dataset, DSO,
provides a justification that the modular networks were able to successfully learn and
correlate the individual input-output parameter relationships better than the general
ANNs.
4.2.7 Summary
Modular ANN was used to predict the output in-flight particle characteristics of
the APS process from the power and injection parameters. The typical ANN structures
handled the versatility and non-linearity associated with APS in predicting the overall
in-flight particle characteristics. However, the introduction of modular ANN in modelling
the APS process was successful and performed better in terms of individually
correlating each of the output parameters with the input power and injection processing
parameters.
One of the objectives behind implementation of modular ANN was to reduce the
model complexity and build simple ANN structures. The use of single hidden layer
architectures in NET1, NET2 and NET3 was able to correlate the input-output
relationships and helped in the construction of simple ANNs. Breakdown of the task
into sub-tasks, and allowing each network to concentrate on a single sub-task,
simplified the problem thereby allowing each of the networks to comprehend the
underlying input / output parameter relationships with a relatively smaller number of
hidden layer neurons. This lowered the number of hidden layer neurons, which helped
in reducing the number of network parameters. The use of regularization in the training
algorithm further reduced the number of active network parameters. The use of a single
hidden layer reduced the number of network parameters. Each network was allocated
only a sub-task; thereby allowing each of the networks to comprehend the underlying
input / output parameter relationships with a relatively smaller number of hidden layer
neurons. This further reduced the number of network parameters.
The reduced number of parameters, available for network training and
optimization, decreased the fluctuations of the network parameters. The optimum
training condition was achieved with a smaller range of values in the training
parameters. Furthermore, the training process was more stable than typical ANN
structures, with the response of the networks to the changes in the number of hidden
layer neurons following a definite trend. For NET1 and NET3, the changes in R and
Chapter 4: Network Structure Modification and Multi-Net System
Tanveer Ahmed Choudhury Page 143
generalization error values, with the variations of the number of hidden layer neurons,
presented exponential trends. For NET2 the generalization performance, over different
number of hidden layer neurons, was stable except for few fluctuations. These results
relate to the overall stability and robustness of the trained networks.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 145
Chapter 5 Extreme Learning Machine and Sensitivity Analysis
This chapter is divided in two sections.
Section 5.1 discusses the use of an extreme learning machine algorithm to
predict the in-flight particle characteristics of an atmospheric plasma spray process.
The networks trained with the extreme learning machine algorithm are found to have
good generalization performance, much shorter training times and stable performance
with regard to the changes in number of hidden layer neurons. The trends represent
robustness of the trained networks and enhance reliability of the application of the
artificial neural network in modelling the plasma spray process.
Section 5.2 presents a sensitivity analysis of the various trained artificial neural
networks. Sensitivity of the trained network’s output in-flight particle characteristics
were computed with the variations of the in-flight particle characteristics.
5.1 Extreme learning machine
Work illustrated in Section 5.1 has been published in the following journal:
T. A. Choudhury, C. C. Berndt, and Z. Man, "An Extreme Learning Machine
Algorithm to Predict the In-flight Particle Characteristics of an Atmospheric Plasma
Spray Process," Plasma Chemistry and Plasma Processing, vol. 33, pp. 993-1023,
2013.
5.1.1 Background
An extreme learning machine algorithm, based on a robust single hidden layer
feed forward neural network (SLFN) structure, is used in this section to model the
atmospheric plasma spray (APS) process in predicting the in-flight particle
characteristics from the input processing parameters. The in-flight particle
characteristics, as described in previous chapters, are considered important
parameters to comprehend the manufacturing process because they affect the in-
service coating properties. Therefore, proper and accurate prediction of in-flight particle
characteristics is essential.
Work on control and modelling the APS process has been performed by the
current author [163, 164] as well as by others [14, 38-40]. The prior studies
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 146
implemented a multi-layer perceptron (MLP) feed forward neural network structure with
back-propagation (BP) algorithms for network training. The BP algorithms worked quite
well in training the networks to learn the process dynamics and overcome the non-
linearity and versatility associated with the APS process. However, there are some
disadvantages associated with the training of such feed forward neural networks with
BP algorithms. The disadvantages are outlined below.
The first disadvantage is the network learning speed, which is far slower than
desired. It becomes unsuitable to be incorporated to any real time system or to an on-
line thermal spray control system along with a diagnostic tool to allow the automated
system achieve the desired process stability. The extensively used back-propagation
algorithms are gradient based learning algorithms, which generally have slow error
convergence speed due to improper learning steps. Furthermore, the entire network
parameters set are required to be trained iteratively, which increases the training times.
Many iterative learning steps may be required by such learning algorithms to obtain
good generalization performance. It is difficult to obtain an optimal value of the network
learning rate parameter η, which defines the speed of convergence. With small value of
η, the network converges slowly. If η is made large, the algorithm becomes unstable
and diverges. Another peculiarity is the existence of local minima in the error surface
[64]. This causes the algorithm, at times, to stop at the local minima instead of
converging to the global error minimum. Additional validation sets or suitable stopping
criteria are required during training to prevent the networks from being over-trained.
Other research [165, 166] has shown that the SLFNs with randomly chosen
input weights and hidden layer biases, are capable of learning N exact and distinct
observations with a small arbitrary error. Further work [167] has applied such methods
on artificial and real large applications to show fast learning times and good
generalization performance. Additional studies [168] have indicated that such feed
forward networks can universally approximate any continuous functions on any
compact input set. The concept is different to the general understanding of traditional
function approximation theories [169], where all the network parameters require
adjustment to achieve the best result.
In correlation to the above discussion, Huang et al. [170, 171] developed the
extreme learning machine (ELM) algorithm based on the following concepts: (i) input
weights and hidden layer biases of SLFNs are randomly assigned; provided that the
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 147
activation functions in the hidden layer are infinitely differentiable; (ii) SLFNs are
considered as a linear system; and (iii) output layer weights are determined analytically
through a generalized inverse operation of the hidden layer output matrices. The
evolution of the ELM algorithm has shed new light in the training of feed forward
networks and have been applied in many different areas [172].
The study focuses on the use of a SLFN in combination with both ELM and
standard BP algorithms to train the networks for modeling the APS process. The BP
algorithms are the most commonly used training algorithms and are designed for both
single and multi-layer models [173]. The SLFN structure is found to be successful in
modeling the APS process to correlate the relationship between the output in-flight
particle characteristics and the input power and injection parameters; as well as
handling the non-linearity and fluctuations associated with the APS process. The
combination of the SLFN structure and ELM algorithm, however, goes an extra step
and overcomes the aforementioned difficulties faced during training ANNs with BP
algorithms. No literature has been identified that employs SLFNs and ELM algorithm in
modeling APS processes.
The learning speed of the ELM algorithm is faster than traditional BP
algorithms. The ELM algorithm also generates relatively good generalization
performance. Unlike standard BP algorithms, the ELM algorithm is easier to implement
and tends to reach the smallest training error and norm of weights. This indicates good
generalization performance according to Bartlett’s [174] theory on generalization
performance of feed forward neural networks. The theory states that for feed forward
neural networks reaching a smaller training error, then the generalization performance
of the networks is better when the norm of weights is smaller.
Section 5.1.2 provides an introduction to the SLFN structure and the database
handling steps. The ELM algorithm is outlined along with different modelling aspects of
the artificial neural networks (ANNs). Construction of additional networks with the SLFN
structure and trained with different standard BP algorithms are also described. Section
5.1.3 presents the simulation results of the networks trained with both ELM and BP
algorithms. The section also provides performance comparison of the networks trained
with ELM and traditional BP algorithms. The performance features comparison enables
an enhanced understanding of the advantages and disadvantages of the ELM
algorithm in training the SLFNs to model the APS process. Section 5.1.4 presents
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 148
further analysis of the results and provides a detail discussion of the research findings.
A summary of the work is presented in Section 5.1.5.
5.1.2 Artificial neural network modelling
The SLFN architecture for modelling the APS process that predicts the in-flight
particle characteristics is shown in Figure 5-1. The input layer consists of 8 data points
and the output layer has 3 neurons. The choices are explained in Section 3.4 of
Chapter 3. In Figure 5-1, 1N represents the number of linear nodes or neurons in the
hidden layer; jiw (where i 1 8= and j N11= ) represents the input layer weights;
jiβ (where i N11= and j 1 3= ) represents the output layer weights.
Figure 5-1: Proposed single layer feed forward network (SLFN) artificial neural network
architecture.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 149
A database, DSO, available from the open literature [40] (Table 3-1) was used in
this work. The database contained 16 data points. To ensure that the extreme learning
machine networks have sufficient data to be trained, DSO was expanded using kernel
regression. The resulting data were tabulated to generate the expanded database,
DSE. Details of the data collection, pre-processing and expansion steps are discussed
in Sections 3.2 and 3.3 of Chapter 3.
The expanded dataset DSE was divided into test and training sets. The test set
is unseen to the network since it is being trained with the training set. Error generated
by the networks on the test set provides a measure of the generalization error. The
trained network’s ability to generalize the process is better when this error is lower.
Twenty per cent of DSE was selected as the test set, DSET, and the remaining 80% as
the training set DSETR. Data division was performed by the process of interleaving,
which ensured that both DSET and DSETR represented an overall view and statistical
representation of the whole database, DSE. The data division ratio was selected such
that the absolute difference in fluctuations of the two data sets was the least. This
would depict that the training and the test sets were statistically most similar to each
other in terms of data variations and fluctuations and would provide a strong base to
train a network having good generalization ability. The work is similar that reported in
one of the author’s previous studies [163].
5.1.2.1 Outline of the extreme learning machine algorithm
This section provides an outline of the ELM algorithm. The algorithm uses a
batch learning technique for the network training process. In the batch learning mode,
the network weight and bias updates are performed after the presentation of all the
training samples, constituting an epoch [64]. The algorithm considers the SLFN with
1N hidden neurons, where N N1 ≤ ; N being the number of training samples.
For a set of N distinct arbitrary samples ( ),i ix t , where
[ ]1 2, ,..., T ni i i inx x x= ∈x R are the input vectors and [ ]1 2, ,..., T m
i i i imt t t= ∈t R are the
target vectors; the output jy of the SLFNs with 1N hidden neurons and activation
function of ( )g x can be computed as:
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 150
( )N
j i i j ii
y g b j N1
1, 1,...,β
=
= + =∑ w x Equation 5-1
In Equation 5-1, [ ]1 2, ,..., Ti i i inw w w=w is the vector representing weights
connecting the thi hidden neuron and the input neurons. [ ]1 2, ,..., Ti i i imβ β β β=
represents the weight vector defining the connection between the thi hidden neuron
and the output neurons. The bias or threshold of the thi hidden neuron is represented
by vector ib . The inner product of the weight and input vector is represented by i jw x
while the output neurons are chosen to be linear.
It has been proven [168] that standard SLFNs, with 1N hidden neurons (such
thatN N1 ≤ ) and both linear and non-linear activation function ( )g x , can approximate
N samples with zero errors. This conclusion leads us to Equation 5-2 and Equation
5-3.
N
j jj
y t1
10
=
− =∑ Equation 5-2
( )N
i i j i ji
g b t j N1
1, 1,...,β
=
+ = =∑ w x Equation 5-3
For convenience and easier understanding, Equation 5-3 is re-written in matrix
format to Equation 5-4.
β =H T Equation 5-4
Where, H is called the hidden layer output matrix of the neural network [165,
175] and is defined by Equation 5-5. Equation 5-6 and Equation 5-7 represents the
matrix form of β from Equation 5-1 and the target vectors T .
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 151
( )( ) ( )
( ) ( )
N N N
N N
N N N NN N
b b x x
g x b g x b
g x b g x b
1 1
1 1
1 11
1 1 1
1 1 1 1
1 1
, , , , , , , ,
×
+ + =
+ +
H w w
w w
w w
Equation 5-5
T
TN N m1
1
1ββ
β×
=
Equation 5-6
1T
TN N m
t
t×
=
T Equation 5-7
It is found [165, 166, 175] that when the number of hidden neurons equals the
number of distinct training samples, N N1 = ; then matrix H is square and invertible.
Under these conditions the SLFNs can approximate the training samples with zero
error. However, in many cases the number of hidden layer neurons is much smaller
than the number of distinct training samples; i.e., N N1 . H then becomes a non-
square matrix and ( )i i ib i N1, , 1, ,β = w may not exist such that β =H T . Thus, the
specific values of ( )i i ib i N1, , 1, ,β∧ ∧ ∧
= w must be computed, which leads to Equation
5-8. The equation is equivalent to minimizing the cost function E (Equation 5-9).
( )i i
N N
N Nb
b b
b b
1 1
1 1
1 , 1
1 1, ,
, , , ,
min , , , , ,β
β
β
∧ ∧ ∧ ∧ ∧ −
= −
H w w T
H w w Tw
Equation 5-8
( )NN
i i j i jj i
E g b t1
2
1 1β
= =
= + −
∑ ∑ w x Equation 5-9
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 152
The input weights iw and the hidden layer biases ib employed in the ELM
algorithm are randomly initialized and not tuned during the training process. The hidden
layer output matrix H remains unchanged once the random values have been
assigned to the network parameters at the commencement of the training. From
Equation 5-8 it is observed that, for fixed values of network parameters iw and ib ,
SLFN training is equivalent to computing the least square solution β∧
of the linear
system β =H T . Equation 5-8 can thus be written as Equation 5-10.
( )
( )N N
N N
b b
b b
1 1
1 1
1 1
1 1
, , , , ,
min , , , , ,β
β
β
∧
−
= −
H w w T
H w w T Equation 5-10
The optimal output weights for ELM can be computed using Equation 5-11,
where H† is the Moore-Penrose generalized inverse of matrix H [176, 177].
H T†β∧
= Equation 5-11
The ELM algorithm can be summarized as follows:
1. Input weights ( iw ) and bias ib i N1, 1, ,= are randomly assigned
2. The hidden layer output matrix, H , is computed
3. The output weight, β , is computed †β = H T , where the parameters are
defined in Equation 5-5, Equation 5-6 and Equation 5-7.
The algorithm, in theory, only works for any infinitely differential activation
function ( )g x . Activation of such categories includes the sigmoidal functions as well as
the radial bias, sine, cosine, exponential and other non-regular functions. Furthermore
the upper bound on the number of hidden layer neurons is the number of distinct
training samples, N N1 ≤ [175].
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 153
5.1.2.2 Network training conditions
The training dataset DSETR was used for training the networks with ELM
algorithm. The generalization performances of the trained networks were obtained by
simulating and testing the trained networks with the test dataset DSET.
The number of hidden layer neuron of SLFNs was varied from 1 to 300 with
increments of 1. A sigmoidal activation function was used for the hidden layer while a
linear activation function was used for the output layer. For each neuron number, the
network training was repeated 100 times to observe the variations of performance
parameters over repeated training. The network generating the maximum correlation
coefficient (R) on the test set, DSET, was selected for comparison purposes. The
correlation coefficient (R) value indicated how the simulated values matched the actual
test output. A greater match would be represented by a higher R-value. The
corresponding generalization error, measured in terms of mean absolute error (MAE),
(Equation 3-7), was also stored along with other network features and performance
parameters.
5.1.2.3 Construction of additional networks
Three separate ANN sets were based on the SLFN architecture (Figure 5-1). All
three network sets were trained under different gradient descent based BP algorithms.
The first and second network sets were trained with Levenberg-Marquardt [156]
(Section 2.2.2.2) and resilient back propagation [104] (Section 2.2.2.4) algorithms,
respectively, and were labelled as BP-LM and BP-RP. The third network set was
trained with the Bayesian regularization [101] (Section 2.2.2.3) algorithm and was
labelled as BP-BR.
BP-LM and BP-RP used cross validation and an early stopping technique to
combat the problem of over-fitting and, thus, these methods required a separate
validation set. The validation set was not used for any network training purposes. The
data was only used to measure and monitor the error generated, on this particular set,
by the trained network during the training process. This error was termed as the
validation error. At any time during the training, if the validation error increased for a
specific number of epochs, the training was stopped and network parameters at the
minimum validation error was stored and saved. An increase in the validation error for
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 154
a specific number of epochs indicated that the network had started to overfit the
training data. The network BP-BR used a regularization technique to combat over-
fitting and, thus, did not require any separate validation set.
The training dataset DSETR was used for all the three networks. Similarly, DSET
was used for testing the generalization performance of the trained networks. Twenty
per cent of DSETR was selected by interleaving to obtain the validation set for the
networks BP-LM and BP-RP. The remaining DSETR was used for network training. BP-
BR used the whole of the training set, DSETR, because it did not require any additional
validation set.
For each of the networks, the training was initiated with 1 neuron in the hidden
layer. The initial weights and biases were set to random values between 0 and 1. The
maximum number of epochs was set to 300. The transfer function in all layers was set
to tan-sigmoid. The training was repeated 100 times and the network generating
maximum correlation coefficient, R, on the test set, DSET, was stored and saved along
with the other performance measure parameters. The number of neurons in the hidden
layer was increased by unity each time and the training procedure was repeated. A
maximum of 300 neurons was used in the hidden layer.
5.1.3 Simulation results and performance comparisons
This section provides the simulation results obtained from the ELM algorithm as
well as the other BP algorithms.
5.1.3.1 Extreme learning machine algorithm performance
Figure 5-2 presents the variations of correlation coefficient (R) and
generalization error (MAE) values, with the changes in the number of hidden layer
neurons.
The R-values rose with an increase of the number of hidden layer neurons
(Figure 5-2). However the rise and fluctuations were fixed within a small range. The
maximum R-value obtained was 0.9950 with 242 hidden layer neurons. This network
was named as ‘ELM-1’. The minimum R-value of 0.8959 was obtained with 1 hidden
layer neuron. The average R-value, over variations of 300 hidden layer neurons, was
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 155
0.9911. The fluctuation of all the R-values was computed in terms of the standard
deviation and the value was 0.0076.
The generalization error followed the expected trend of decreasing with an
increase of the number of hidden neurons (Figure 5-2). The average generalization
error for all the networks was 0.0081 with a standard deviation of 0.0021. The minimum
error of 0.0070 was found for the network ‘ELM-1’ with 242 hidden layer neurons. The
maximum generalization error of 0.0288 was found for the network with 1 hidden layer.
Figure 5-2: Generalization performance variations of the networks trained with the
extreme learning machine algorithm with respect to the number of hidden layer
neurons.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 156
The average training times (CPU usage time in seconds) for ELM was 1.48
seconds with standard deviations of 0.30 seconds while training all the networks from 1
to 300 hidden layer neurons. The variations of training times, with the number of hidden
layer neurons, are presented in Figure 5-3. The number of network parameters
available for optimization increased with the increase in number of hidden layer
neurons. This increased the training time. The network with 300 neurons took the
maximum time of 2.15 seconds to train the network and the network with 1 hidden layer
neuron required the lowest time of 1.10 seconds. The reference network ‘ELM-1’ took
1.78 seconds for its training.
Figure 5-3: Variations of training times of the networks trained with the extreme
learning machine algorithm with respect to the number of hidden layer neurons.
5.1.3.2 Standard artificial neural networks performance
The variations of generalization performance and training times of the networks
trained with the Levenberg-Marquardt algorithm, with respect to the changes in the
number of hidden layer neurons, are presented in Figure 5-4.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 157
Figure 5-4: Generalization performance and training times of the networks trained with
the Levenberg-Marquardt (LM) algorithm with respect to the number of hidden layer
neurons.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 158
Among the networks trained with the Levenberg-Marquardt algorithm, the
maximum R-value of 0.9956 was generated by the network with 291 hidden layer
neurons. The corresponding generalization error was 0.0058 with a network training
time of 39.84 seconds. The R and generalization error values fluctuated with the
changes in the number of hidden layer neurons. There were no particular trends found
in the performance values. The average R-value, over all the networks, was 0.9238
with standard deviation of 0.0505. The average generalization error was 0.0117 with a
standard deviation of 0.0051. The training times presented a rising and fluctuating
trend with the increment of the number of hidden layer neurons. The maximum and
minimum training times for all the networks with different number of hidden layer
neurons was 459.29 seconds and 0.24 seconds, respectively. The average of the
training times was 27.23 seconds with a standard deviation of 41.23 seconds.
The network, corresponding to ‘ELM-1’, generated a R-value, generalization
error and training time of 0.9263, 0.0191 and 31.95 seconds, respectively. This network
is referred to as ‘LM-1’.
The resilient back-propagation algorithm required 74 hidden layer neurons to
generate the maximum R-value of 0.9881 with a generalization error of 0.0106 and
training time of 0.67 seconds. The variations of network performance, over a different
number of hidden layer neurons, are presented in Figure 5-5.
The R-values reduced considerably with an increase in the number of hidden
layer neurons. The R and generalization error values flattened over 150 hidden layer
neurons. The fluctuations, however, were still present. The average R-value of all the
networks trained with the resilient back-propagation algorithm was 0.7979 with a
standard deviation of 0.1192. The corresponding average generalization error was
0.1386 with a standard deviation of 0.1557. The fluctuations of training times increased
with the increase in the number of hidden layer neurons. The average training times for
training all the networks was 1.15 seconds with a standard deviation of 0.92 seconds.
In reference to the network ‘ELM-1’, the corresponding network trained with the
resilient back-propagation algorithm generated R-value and generalization error of
0.8441 and 0.0205, respectively. The training time was 3.64 seconds. This network is
named as ‘RP-1’.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 159
Figure 5-5: Generalization performance and training times of the networks trained with
resilient back-propagation (RP) algorithm with respect to the number of hidden layer
neurons.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 160
In comparison to the Levenberg-Marquardt and the resilient back-propagation
algorithms, the Bayesian regularization algorithm generated the maximum R-value of
0.9986 and the lowest generalization error of 0.0024 with 179 hidden layer neurons.
However, the algorithm required a larger training time of 1,140.60 seconds.
The variations of training times and generalization performance, over a different
number of hidden layer neurons, are presented in Figure 5-6. The generalization
performance parameters presented a definite trend and fluctuated much less in
comparison to that of the networks trained with the Levenberg-Marquardt and resilient
back-propagation algorithms. The average R-value for all the networks, with a different
number of hidden layer neurons, was 0.9977 with a standard deviation of 0.0016. The
corresponding generalization error and its standard deviation was 0.0033 and 0.0011,
respectively. The average training time was 1,322.45 seconds. The training time
increased rapidly with an increase in the number of hidden layer neurons. The standard
deviation of the training times was 1,475.12 seconds.
Among all the networks trained with Bayesian regularization, the network
corresponding to ‘ELM-1’ showed an R-value of 0.9972, a generalization error of
0.0043 and a training time of 2,730.97 seconds. This network is termed as ‘BR-1’.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 161
Figure 5-6: Generalization performance and training times of the networks trained with
Bayesian regularization (BR) algorithm with respect to the number of hidden layer
neurons.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 162
5.1.3.3 Network performance comparisons
A summary of the performance of networks, trained with ELM, Levenberg-
Marquardt, resilient back-propagation and Bayesian regularization algorithms, is
provided in Table 5-1. The comparisons of the generalization and training
performances of the networks indicate superior performance of the ELM algorithm in
comparison to the back propagation algorithms, with respect to the training times.
The average generalization performances of the four different networks are
presented in Figure 5-7. The average R-value of all the networks trained with ELM was
computed to be 0.9911 (Table 5-1). This value was greater in comparison to that of
networks trained with the Levenberg-Marquardt and the resilient back-propagation
algorithms. The average R-value of the networks trained with the Bayesian
regularization algorithm was, however, slightly higher than that of the networks trained
with ELM by a factor of 0.0066 (subtracting the average R-value of the networks
trained with ELM algorithm (0.9911) from the average R-value of the networks trained
with Bayesian regularization algorithm (0.9977)); Table 5-1 and Figure 5-7.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 163
Table 5-1: Summary of the training performances of extreme learning machine (ELM)
and back propagation (BP) algorithms in training the artificial neural networks with
variations of hidden layer neurons from 1 to 300.
Extreme Learning Machine (ELM) algorithm
Correlation Coefficient (R)
Generalization Error (MAE)
Training time (seconds)
Maximum value 0.9950 0.0288 2.15 Minimum value 0.8959 0.0070 1.10 Average value 0.9911 0.0081 1.48
Standard deviation 0.0076 0.0021 0.30 Levenberg-Marquardt algorithm
Correlation Coefficient (R)
Generalization Error (MAE)
Training time (seconds)
Maximum value 0.9956 0.0402 459.29 Minimum value 0.7320 0.0040 0.24 Average value 0.9238 0.0117 27.23
Standard deviation 0.0505 0.0051 41.23 Resilient Back-propagation algorithm
Correlation Coefficient (R)
Generalization Error (MAE)
Training time (seconds)
Maximum value 0.9881 0.7812 4.39 Minimum value 0.3208 0.0100 0.15 Average value 0.7979 0.1386 1.15
Standard deviation 0.1192 0.1557 0.92 Bayesian regularization algorithm
Correlation Coefficient (R)
Generalization Error (MAE)
Training time (seconds)
Maximum value 0.9986 0.0159 5,161.16 Minimum value 0.9745 0.0023 1.00 Average value 0.9977 0.0033 1,322.45
Standard deviation 0.0016 0.0011 1,475.12
The R-values generated by the networks trained with the ELM algorithm
fluctuated less in comparison to networks trained with the Levenberg-Marquardt and
resilient back-propagation algorithms. The R-values for the ELM algorithm exhibited a
standard deviation of 0.0076, while the Levenberg-Marquardt and resilient back-
propagation algorithms showed standard deviations of 0.0505 and 0.1192,
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 164
respectively. The Bayesian regularization algorithm generated R-values of slightly
lower fluctuation than the ELM algorithm. The standard deviation of all the R-values of
the networks trained under the Bayesian regularization algorithm was 0.0016 (Table
5-1).
Figure 5-7: Average generalization performance comparison of the extreme learning
machine algorithm with standard back-propagation algorithms.
The average generalization error of all the trained networks with the Bayesian
regularization algorithm and the standard deviation was 0.0033 and 0.0011,
respectively. These values were the lowest in comparison to other networks (Table
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 165
5-1). The average generalization errors of the networks trained with the ELM algorithm
was 0.0081 with a standard deviation of 0.0021 (Table 5-1). In agreement with the
result obtained with the R-values, both the average generalization error value and its
standard deviation was much smaller compared to that of networks trained with the
Levenberg-Marquardt and resilient back-propagation algorithms (Figure 5-7).
The networks trained with resilient back-propagation algorithm demonstrated
the lowest average training time of 1.15 seconds. This was followed by 1.48 seconds
required by the networks trained with the ELM algorithm. The networks trained with the
Levenberg-Marquardt and Bayesian regularization algorithms required longer training
times of 27.23 and 1,322.45 seconds, respectively (Table 5-1).
The fluctuations in training times, over the entire combination of networks with 1
to 300 hidden layer neurons, was the lowest for the networks trained with the ELM
algorithm. This was followed by the resilient back-propagation and Levenberg-
Marquardt algorithms. The networks trained with the Bayesian regularization algorithm
fluctuated most with a standard deviation of 1,475.12 seconds (Table 5-1).
Figure 5-8 provides bar chart comparisons of the generalization performances
of the selected networks ‘ELM-1’, ‘LM-1’, ‘RP-1’ and ‘BR-1’. Alongside the figure, Table
5-2 provides a detailed summary of the generalization performances and the training
times. The correlation coefficient (R) value generated by ‘ELM-1’ was the highest in
comparison to ‘LM-1’ and ‘RP-1’. The R-value of ‘BR-1’ was slightly higher than that of
‘ELM-1’ by 0.0022. In terms of the generalization error performance, the ‘ELM-1’
network again outperformed the networks ‘LM-1’ and ‘RP-1’. The generalization error of
‘BR-1’ was, however, slightly better than that of ‘ELM-1’ by 0.0027. In terms of the
network training times, the ELM algorithm outperformed all the other three networks. It
revealed the lowest training time of 1.78 seconds in comparison to that of 31.95
seconds by ‘LM-1’, 3.64 seconds by ‘RP-1’ and 2,730.97 seconds by ‘BR-1’.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 166
Figure 5-8: Generalization performance comparisons of the selected networks trained
with extreme learning machine and standard back-propagation algorithm.
Table 5-2: Summary of the generalization performances of different selected artificial
neural networks
ELM-1 LM-1 RP-1 BR-1 Correlation Coefficient (R) 0.9950 0.9263 0.8441 0.9972 Generalization Error (MAE) 0.0070 0.0191 0.0205 0.0043
Training time (seconds) 1.78 31.95 3.64 2730.97
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 167
Summarizing the results, it is observed that the network ‘ELM-1’ outperforms
the networks ‘LM-1’ and ‘RP-1’ on the basis of correlation coefficient (R) values,
generalization error and the training times. ‘BR-1’ performs slightly better than ‘ELM-1’
in terms of R-value and the generalization error by around 0.27%. However, the
training time for ‘BR-1’ is over 2,000 times larger than that of ‘ELM-1’.
5.1.4 Result analysis and discussion
The ELM algorithm outperformed the Levenberg-Marquardt algorithm and the
resilient back-propagation algorithm in training the ANNs to model the APS process in
predicting the in-flight particle characteristics from the input processing parameters.
The slight advantage presented by the Bayesian regularization algorithm of having a
small improved generalization performance was overshadowed by the disadvantage of
large training times. This causes the Bayesian regularization algorithm to be impractical
and unsuitable to train any ANNs used for on-line control system.
The ELM algorithm randomly fits the input weights and only optimizes the
output layer weights during network training; thus it reduces the network training time.
Small training times would imply that this algorithm would be suitable to be fitted to an
on-line APS control system. The network could be continuously trained and updated
with the new spray data in a real time spray environment. This result is specific and
unique to the plasma spray process and derives from the nature of the experimental
data.
The chosen network ‘ELM-1’ was further used to simulate the original database
DSO (Table 3-1). The predicted values obtained were compared with their respective
experimental ones and the corresponding correlation coefficient (R) and generalization
error (MAE) values computed. The R value was 0.9902 with a corresponding
generalization error of 0.0071. This result shows good performance of the ELM
algorithm in training the ANN and represents the overall performance of the network.
Further analysis was, thus, performed to view the generalization performance in
predicting each of the three output parameters and the correlation drawn by the ANN
between each of the input processing parameters on the output in-flight particle
characteristics.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 168
The absolute relative error percentage (with respect to the experimental values)
was computed for each value of the predicted in-flight particle characteristics, Table
5-3. The absolute average relative error percentages for in-flight particle velocity,
temperature and diameter were computed to be 1.60%, 0.52% and 0.63%,
respectively. The predicted velocity, temperature and diameter values by the network
‘ELM-1’ demonstrate good coherence and correlation with the experimental values.
The order of magnitude in errors obtained is well within the experimental errors of
these physical measurements; implying that the methods adopted with the ELM
algorithm are acceptable. All the predicted values were obtained from analysis of the
original database and represent the existing correlations.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 169
Table 5-3: Input processing parameters along with the corresponding experimental and
predicted in-flight particle characteristics values. The individual and average absolute
relative error percentage is also mentioned. Note: the variations of each of the input
processing parameters are highlighted in bold. The other parameter values were hold
constant to their reference values.
I [A]
VAr [SLPM]
VH2 [SLPM]
VCG [SLPM]
Dinj [mm]
ID [mm] * V
[m/s] T
[°C] D
[μm]
350 40 14 3.2 6 1.8 E 242 2262 43 P 241.58 2261.11 42.91
RE 0.17 0.04 0.22
530 40 14 3.2 6 1.8 E 270 2399 51 P 264.84 2378.00 49.96
RE 1.91 0.88 2.04
750 40 14 3.2 6 1.8 E 278 2428 50 P 277.71 2427.66 49.93
RE 0.10 0.01 0.14
530 40 0 3.2 6 1.8 E 205 1675 30 P 206.41 1721.15 30.16
RE 0.69 2.76 0.54
530 40 4 3.2 6 1.8 E 241 2170 38 P 239.66 2140.52 37.74
RE 0.56 1.36 0.69
530 40 8 3.2 6 1.8 E 260 2351 45 P 260.28 2356.58 45.16
RE 0.11 0.24 0.36
530 40 10 3.2 6 1.8 E 264 2373 47 P 265.30 2397.93 47.94
RE 0.49 1.05 1.99
530 45 15 3.2 6 1.8 E 176 2403 51 P 194.53 2390.26 50.58
RE 10.53 0.53 0.83
530 22.5 7.5 3.2 6 1.8 E 179 2456 49 P 180.49 2451.50 48.94
RE 0.83 0.18 0.13
530 37.5 12.5 3.2 6 1.8 E 263 2393 50 P 287.43 2395.75 49.51
RE 9.29 0.11 0.97
530 40 14 2.2 6 1.8 E 252 2352 48 P 251.61 2351.77 47.97
RE 0.15 0.01 0.06
530 40 14 4.4 6 1.8 E 277 2440 54 P 277.01 2438.78 53.90
RE 0.00 0.05 0.18
530 40 14 3.2 7 1.8 E 270 2434 47 P 271.76 2461.39 47.85
RE 0.65 1.13 1.81
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 170
Table 5-3: Input processing parameters along with the corresponding experimental and
predicted in-flight particle characteristics values. The individual and average absolute
relative error percentage is also mentioned. Note: the variations of each of the input
processing parameters are highlighted in bold. The other parameter values were hold
constant to their reference values (Continued).
I [A]
VAr [SLPM]
VH2 [SLPM]
VCG [SLPM]
Dinj [mm]
ID [mm] * V
[m/s] T
[°C] D
[μm]
530 40 14 3.2 8 1.8 E 278 2451 52 P 278.00 2451.22 51.99
RE 0.00 0.01 0.01
530 40 14 3.2 6 1.5 E 265 2498 54 P 264.86 2497.48 53.97
RE 0.05 0.02 0.05
530 40 14 3.2 6 2.0 E 278 2363 43 P 277.91 2362.71 43.00
RE 0.03 0.01 0.01
Absolute Average Relative Error Percentage (%) 1.60 0.52 0.63
* “E” represents the experimental value “P” represents the predicted value “RE” represents the absolute relative error percentage (%)
Each of the predicted and experimental output average in-flight particle
characteristics were plotted against the six input processing parameters; i.e., the
current intensity, hydrogen flow rate, total plasma gas flow rate, argon carrier gas flow
rate, the injector stand-off distance and the injector diameter (Figure 5-9 to Figure
5-14). The plots present comparisons of the predicted values with respect to the
experimental data. The graphs provide the insights concerning parameter relationships
and correlations for the APS process.
Figure 5-9 presents the in-flight particle characteristics plotted against the arc
current intensity values. The predicted velocity and temperature values increase with
an increase of arc current intensity. The predicted diameter value shows a similar effect
except for a slight decrease at the higher current value. This could have been as a
result of particle vaporization at higher power levels. The results correlate with the
experimental values and have been reported for different materials [15, 16, 19].
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 171
Figure 5-9: Variations of in-flight particle characteristics with the changes in current
intensity.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 172
The predicted values of the in-flight particle characteristics follow the
experimental values in presenting a rising trend with an increase of the hydrogen
secondary plasma gas flow rate; Figure 5-10. The hydrogen content in the plasma gas
improves the velocity, temperature and enthalpy of the plasma jet [158] along with the
heat and momentum transfer to the particles [159]. These conditions improve the
overall in-flight particle characteristics [160, 161].
From Figure 5-11, the predicted in-flight particle velocity increases with an
increase of the total plasma gas flow rate. The predicted particle temperature is, on the
other hand, found to drop initially and then rise rapidly. The results correlate with the
experimental values. However, these results partially contradict the findings reported in
the literature [161].The finding indicates an increase in both the velocity and
temperature with an increase of the total plasma gas flow rate. From 30 SLPM (Run 9:
Table 3-1) to 40 SLPM (Run 4: Table 3-1), the hydrogen secondary plasma gas flow
rate is nearly doubled, while the argon primary plasma gas flow rate is made 0. This is
directly related to the increase of the momentum being transmitted from the plasma jet
to the particles, which leads to a decrease in the particle residence time in the plasma
jet. This could result in a drop in particle temperature. The predicted diameter values
correlate with the trend presented by the experimental values.
The VAr and VH2 values of 45 and 15 SLPM (Run 8: Table 3-1) were not
considered because the VAr value was greater than its highest individual limit.
Therefore, bias would be introduced into the experimental values since Run 8 is well
out of the range of conventional thermal spray processing parameters. Thus, this value
has not been considered in the analysis since the observations drawn for the whole
data set would be considered inconclusive.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 173
Figure 5-10: Variations of in-flight particle characteristics with the changes in hydrogen
plasma gas flow rate.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 174
Figure 5-11: Variations of in-flight particle characteristics with the changes in total
plasma gas flow rate.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 175
An increase in the carrier gas flow improves particle penetration into the core of
the plasma jet [1, 62]. This results in an increase of the in-flight particle characteristics.
The predicted values correlate to both the experimental database and the findings in
the literature (Figure 5-12).
Variations of injector stand-off distance and injector diameter influence particle
penetration into the plasma jet [62]. An increase in the injector stand-off distance
should improve the particle characteristics. On the other hand, an increase in the
injector diameter should lower the in-flight particle characteristic value.
Figure 5-13 presents an improvement of all the predicted values of the in-flight
particle characteristics with the increase of injector stand-off distance. This finding
correlates with the experimental values as well as those from the literature. Figure 5-14
shows the predicted in-flight particle values, along with the experimental values,
against the change in injector diameter. The experimental and simulation results are,
however, controversial to analyse. The experimental velocity and diameter values
indicate an increase with the injector diameter values, whereas the temperature
decreases. The predicted values are in complete coherence, both in terms of values
and trends, to the experimental values.
The above analysis helps in understanding the effects of variations of input
processing parameters on the output in-flight particle characteristics of the plasma
spray process. It further demonstrates the ability of the ELM algorithm to train an ANN
in modelling the APS process and learn the underlying relationships between the input
and output parameters. The ELM algorithm was used to train the networks using the
expanded database. The trained networks, when tested with the original database,
performed well. This indicates similarity of the information contained in the expanded
dataset and the original dataset.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 176
Figure 5-12: Variations of in-flight particle characteristics with the changes in carrier
gas flow rate.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 177
Figure 5-13: Variations of in-flight particle characteristics with the changes in injector
stand-off distance.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 178
Figure 5-14: Variations of in-flight particle characteristics with the changes in injector
diameter.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 179
5.1.5 Summary
SLFNs were used in this work to model the APS process in predicting the in-
flight particle characteristics from the input processing parameters. The ELM algorithm
used to train the SLFNs was successful in modelling the process dynamics. The ELM
algorithm showed better performance than most of the standard back propagation
algorithms used to train multi-layer feed forward networks. Simulation results confirm
the better performance of ELM both in terms of good generalization ability and shorter
training times.
Furthermore, the generalization performance of the ELM algorithm, over various
networks with different combinations of the number of hidden layer neurons, was more
stable than the back propagation algorithms. These features depict the stability and
robustness of the network learning process. The network stability, robustness and
significantly reduced training times of ELM makes it a desirable candidate to be
incorporated to an on-line plasma spray control system. Such a system would benefit
the plasma spray manufacturing process and assist spray engineers in reducing the
time and complexities associated with spray tuning and setting the crucial thermal
spray parameters.
5.2 Sensitivity analysis of neural networks
5.2.1 Background
In a real time spray process there are variations of the input processing
parameters over time. These variations affect the in-flight particle characteristics and it
is important to know the response of these fluctuations by the designed ANN model.
The network with good generalization ability is not expected to be responsive to the
slight variations of the input processing parameters. The network should only respond if
the variations exceed a specified limit. The limit could be pre-determined before the
start of the spray process.
Most of the variations of input processing parameters result from mechanical
disturbances and can be considered as noise for ANN modelling purposes. The model
is expected to show a certain degree of robustness in compensating for noise. This
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 180
would enhance the reliability of the designed ANN model being incorporated to an on-
line plasma spray system.
Assuming the input parameters to remain constant during a real time spray
environment, disturbances can occur with the network parameters; namely the weights
and biases. These disturbances are important and affect the network output in-flight
particle characteristics. However, the artificial neural network models, proposed for an
on-line control system are generally in the form of a computer program and without
hardware implementation. The probability of the network parameters fluctuating are,
thus, low and can be ignored. Hence, this study only considers the effects of the
fluctuations of the input processing parameters on the designed ANN models in
predicting the output in-flight particle characteristics.
Uniform distributed noise is generated in this study to simulate the effect of
input parameter disturbances. The noises are gradually added to the model inputs to
simulate their effects on the outputs of the trained ANNs. The response of various
networks to different levels of noise were computed and compared with the original
output in-flight particle characteristics. The correlation coefficient (R) values were
computed and the results were analysed to find the degree of fluctuation. Figure 5-15
provides a flowchart to illustrate the work described in this section.
This study uses both MLP and SLFN structures to observe the effects of input
noises on the model output in-flight particle characteristics. Error BP algorithms are
used to train both the MLPs and SLFNs. The ELM algorithm is used in this work to train
the SLFNs. Sensitivity and robustness of the different networks are compared and
analysed, in terms of handling input noise.
Section 5.2.2 introduces to the database handling and noise generation and
addition process. Section 5.2.3 introduces the different ANN models used in this
section to carry out the noise sensitivity analysis. The details of simulation results,
analysis and discussions are presented in Section 5.2.4 with a summary in Section
5.2.5.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 181
Figure 5-15: Flowchart of the sensitivity analysis of designed artificial neural network
models to the fluctuations of the atmospheric plasma spray input processing
parameters.
5.2.2 Database processing and noise addition
The unexpanded original database, DSO (Table 3-1), was used in this section
as the test set for simulating the effects of noise on different, already developed, ANNs.
The database contained 16 data points and was linearly transformed within values of 0
and 1 using Equation 3-1. The normalization ensures equal treatment from ANN in
handling and processing the data. It also prevents any calculation error related to
different parameter magnitudes.
The MATLAB Simulink uniform noise generator block was used to generate the
uniformly distributed noise. The disturbances to the APS input processing parameters
generally occur within a small permissible range. The noise generated should also be
within a small range of values to correlate such affects. The uniform noise generator
was used in this work as the block allows pre-defining of both the upper and lower
bound of the noise values. The absolute upper limit value was set to 0.25, while the
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 182
absolute lower limit value was set to -0.25. The defined limits allowed generation of
both positive and negative noise values. The positive value noises were used to
simulate the increasing effects of the input processing parameters. The negative values
represented the parameters decreasing effects.
Out of the six APS input processing parameters, the injector diameter and
injector stand-off distance have discrete values and are highly unlikely to be affected by
any disturbances. Therefore, the noise was only added to the remaining four input
parameters, namely: (i) arc current intensity, (ii) argon gas flow rate, (iii) hydrogen gas
flow rate, and (iv) argon carrier gas flow rate. Separate sets of uniform noise were
generated for each of the input processing parameters.
The noise generated by MATLAB represents normalized values. Equation 3-1
was used to format the normalized values to real parameter values. Taking the
example of current intensity, the normalized uniformly distributed noise generated by
MATLAB Simulink block was within the upper and lower limit of -0.2453 to 0.2396.
Using Equation 3-1, the corresponding real parameter values are -131.88 A to
128.82A. Table 5-4 provides the maximum and minimum values of the range of
uniformly distributed noise generated for each input process parameter. Both the
normalized and real parameter values are presented.
Table 5-4: Upper and lower limits of the uniform distributed noise values generated for
each of the input atmospheric plasma spray input processing parameters.
Input Processing Parameters
Uniform Distributed Noise Limits
Normalized Values Real Parameter Values Upper Limit
Lower Limit
Upper Limit
Lower Limit
Current Intensity I [A] 0.2396 -0.2453 128.82 -131.88
Argon Gas Flow Rate ArV [SLPM] 0.2255 -0.2412 5.41 -5.79
Hydrogen Gas Flow Rate
2HV [SLPM] 0.2493 -0.2482 4.24 -4.22
Argon Carrier Gas Flow Rate CGV [SLPM] 0.2127 -0.2393 0.64 -0.72
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 183
5.2.3 Artificial neural network models
The effects of input parameter noise were tested on the ANN models proposed
and developed from Chapter 3, Chapter 4 and Section 5.1 of Chapter 5. The previously
selected models, in terms of generating the maximum correlation coefficient (R) values
from each section, were chosen for this purpose. The networks chosen were NN1,
NN2, 111-M, NET-C and ELM-1. A brief introduction and review of how each of these
networks was obtained are illustrated in the following paragraphs.
Chapter 3 used a general MLP ANN structure for modelling the APS to predict
the in-flight particle characteristics from the input power and injection parameters. The
Levenberg-Marquardt and Bayesian regularization back propagation algorithms were
used for network training purpose.
Figure 3-6 and Table 3-4 presented the generalization performance of the
networks trained with the Levenberg-Marquardt algorithm using both the unexpanded
training dataset, DSOTR, and the expanded training dataset, DSETR. The Section 3.5
compared the generalization performances of the different network models. The
analysis highlights the network with a combination of nine and eight neurons in the first
and second hidden layer, respectively, to generate the lowest generalization error of
2.00x10-5 and corresponding R-value of 0.9988. This network was referred to as NN1.
Figure 3-9 presented the accumulated results for all the networks trained with
the expanded training dataset, DSETR using a Bayesian regularization algorithm.
Section 3.5 presents the performance comparison of all such trained networks. The
network, with a combination of eight and seven neurons in the first and second hidden
layers, generated the maximum R-value of 0.9996 with a corresponding minimum
generalization error of 7.79x10-6. This network was referred to as NN2.
Section 4.1 proposed and used an optimized MLP structure ‘111’ to model the
APS process in predicting the output in-flight particle characteristics from the input
processing parameters. Figure 4-3 in Section 4.1.5.1 presented the bar chart
comparison of R-values and generalization errors of all the networks with structure
‘111’ and having different combinations of neurons in the hidden layers. All the
networks were trained with the Levenberg-Marquardt algorithm with the expanded
training dataset, DSETR. The network with 8 and 7 neurons, in the 1st and 2nd hidden
layers respectively, was marked as ‘111-M’. This network generated the maximum
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 184
correlation (an R-value of 0.9996) between the predicted and actual outputs, when
simulated with the test set.
Section 4.2 used a modular ANN method to model the APS process. The
method allowed ANN to individually correlate each of the three output in-flight particle
characteristics with the APS input processing parameters. The APS process was, thus,
split into three sub-processes and each sub-process was assigned a different ANN,
termed as NET1, NET2 and NET3. NET1 was used to model the in-flight particle
velocity with the APS input parameters. NET2 and NET3 were used for modelling the
in-flight temperature and diameter, respectively, with the selected APS parameters.
The process is illustrated in Section 4.2.2. The database obtained from the literature
and presented in Table 3-1 was split up accordingly and was used for network training
and testing purposes, Section 4.2.3. A Bayesian regularization algorithm was used for
network training, Section 4.2.4.
Section 4.2.6 presents the simulation results for modular ANN implementation
of the APS process along with detailed comparison, analysis and discussion. From
Section 4.2.6.3, it was found that NET1 achieved the highest R-value of 0.8665, for
predicting the in-flight particle velocity, with 23 hidden layer neurons. This network was
named as NET1-M. NET2 predicted the average in-flight particle temperature with a
maximum R-value of 0.9999 with 3 hidden layer neurons. The network was named as
NET2-M. NET3 required 2 hidden layer neurons to achieve the highest R value of
0.9896 in predicting the in-flight particle diameter and was named as NET3-M. The
outputs of NET1-M, NET2-M and NET3-M were combined, using the Figure 4-14
structure, to generate the final model outputs, labelled as NET-C. NET-C generated the
R-value of 0.8317, Section 4.2.6.3.
Section 5.1 used a single layer feed forward neural network (SLFN) structure to
model the APS process in predicting the output in-flight particle characteristics from the
input power and injection processing parameters. A fast extreme learning machine
(ELM) algorithm was used to train the ANNs. The expanded training dataset DSETR was
used for network training. The generalization performances of the trained networks
were obtained by testing the trained networks with the expanded test dataset DSET.
Section 5.1.2 illustrates the database handling steps, while Section 5.1.2.2 elaborates
the network training conditions.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 185
Section 5.1.3.1 presented the performance of the ELM algorithm in training the
ANNs in correlating the input / output parameter relationships. Figure 5-2 presents the
variations of correlation coefficient (R) and generalization error (MAE) values with the
changes in the number of hidden layer neurons. The maximum R-value obtained was
0.9950 with 242 hidden layer neurons. This network named as ‘ELM-1’. The minimum
error of 0.0070 was found for the network ‘ELM-1’ with 242 hidden layer neurons.
5.2.4 Simulation result analysis and discussion
The generated noise was gradually added to the specified input parameters of
DSO. The amount of noise added was varied from zero to one hundred percent, with
increments of one percent. The selected networks NN1, NN2, 111-M, NET-C and ELM-
1 were simulated with the noisy database by the incremental addition of a percent of
noise. The correlation coefficient (R) value was computed for the resultant network
outputs with respect to the outputs from DSO. The R-value provides an understanding
of the degree of deviation of the network outputs from the expected values. Smaller R-
value depicts greater deviation of the predicted values from the original outputs in DSO.
The R-value, generated by the selected networks, with zero percent noise
represents the ideal condition; the condition under which the networks were modelled,
trained and tested. This value is termed as R0. The variations of the R-values for each
of the networks with the gradual addition of the noise are observed. The trend provides
a good understanding of how each of the artificial neural networks, trained with
different algorithms, network structures and training conditions, respond to fluctuations
of the input parameters. Figure 5-16 to Figure 5-20 provides graphs for the change of
R-values for each of the selected networks with the increase of the noise percentage
on the input test data.
For the network NN1 in Figure 5-16, the correlation coefficient dropped from its
maximum value of R0=0.9154 to 0.3149 with one hundred percentage addition of input
noises. Apart from the slight flattening in the 30 to 50 percent noise range, the drop in
correlation coefficient value was steady. The total drop of R-value was computed to be
0.6005 (Total drop of R = R0 – Rmin = 0.9154 - 0.3149 = 0.6005).
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 186
Figure 5-16: Variations of correlation coefficient (R) values of the selected network NN1
output in-flight particle characteristics with the gradual addition of noise to the
atmospheric plasma spray specified input processing parameters.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 187
In comparison to the performance of NN1 in Figure 5-16, the performance of
NN2, as shown in Figure 5-17, was irregular and noisy. With the addition of only one
percent noise, the R dropped from its maximum value of 0.9996 to 0.5732. The value
dropped down rapidly to around 0.3. The R-value fluctuated around 0.3 until the input
disturbance reached around 25 percent, after which there was again a sudden drop.
The minimum R-value achieved was -0.0690 for the input noise percentage of 52
percent.
Figure 5-17: Variations of correlation coefficient (R) values of the selected network NN2
output in-flight particle characteristics with the gradual addition of noise to the
atmospheric plasma spray specified input processing parameters.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 188
For the network 111-M in Figure 5-18, there was an exponential decay of the R-
value, from R0 (0.9993) to the minimum value of -0.0158, with the gradual increase of
the input noise percentage.
Figure 5-18: Variations of correlation coefficient (R) values of the selected network 111-
M output in-flight particle characteristics with the gradual addition of noise to the
atmospheric plasma spray specified input processing parameters.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 189
Correlating to Figure 5-18, the R-values for the networks, NET-C and ELM-1 in
Figure 5-19 and Figure 5-20, respectively, also decreases with the increase of noise
percentage. However, the rate of decay is much lower and the decay trend is not
exponential.
Figure 5-19: Variations of correlation coefficient (R) values of the selected network
NET-C output in-flight particle characteristics with the gradual addition of noise to the
atmospheric plasma spray specified input processing parameters.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 190
Figure 5-20: Variations of correlation coefficient (R) values of the selected network
ELM-1 output in-flight particle characteristics with the gradual addition of noise to the
atmospheric plasma spray specified input processing parameters.
Figure 5-21 plots the variations of R-values with the input noise percentage, for
all the selected networks, in a single graph for better understanding and comparison
purpose. In all the cases, the values of R were reduced with the addition of noise
percentage. The networks output became more scattered with the increase of input
data noise. However, there were variations in which the networks responded. Some of
the networks were found to be less responsive to the noise addition and some
networks were sensitive to small percentage of input disturbances. Additional analyses
are carried out and the descriptions of the nature of responses of each network are
described in following paragraphs.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 191
Figure 5-21: Combined graph to represent variations of correlation coefficient (R)
values of all the selected networks output in-flight particle characteristics with the
gradual addition of noise to the atmospheric plasma spray specified input processing
parameters.
For all the plots in Figure 5-21, the minimum correlation coefficient values, RMIN,
was determined as well as the corresponding values of noise percentage,
( (%))MINR Noise . For each of the network curves, the drop ratio was computed using
Equation 5-12. In Equation 5-12, R∆ represented the amount of drop for RMIN from R0
and was computed by subtracting RMIN from R0. (%)Noise∆ stands for the
corresponding change in the input noise percentage and can be computed by
subtracting ( (%))MINR Noise from 0( (%))R Noise , which represents the input noise
percentage for R0. The value of 0( (%))R Noise was zero. Thus, the value of
(%)Noise∆ was the same as that of ( (%))MINR Noise .
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 192
0
0
0
MIN
MIN
MIN
MIN
RDrop RatioNoise(%)
R R R (Noise(%))-R (Noise(%))
R R R (Noise(%))
∆=∆
−=
−=
Equation 5-12
The drop ratio value provides an understanding of the ANN sensitivity to the
variations of the input parameters. A higher drop ratio would indicate larger changes in
the correlation coefficient values in comparison to the changes in input noise
percentages. This would show that the network is more sensitive to the fluctuations of
input parameters. On the other hand, a lower value of the drop ratio results from small
change in the correlation coefficient values in relation to the changes in the amount of
input noise percentage. This would represent a network more rigid to the input
parameter fluctuations.
The bar chart representing the drop ratios for all the five selected networks is
presented in Figure 5-22. NET-C exhibited the least drop ratio of 0.0012 among all the
networks. This indicates that the performance of the modular ANNs is least sensitive to
the additions of input parameter disturbances. The result is in coherence to that of
Figure 5-21. The change in R-value, from zero to one hundred percent noise addition,
is small. The plot, represented by the deep blue line, is flatter in comparison to the
graphs for other networks.
The network NN2 revealed the highest drop ratio of 0.0206, Figure 5-22. The
result coincides with the plot obtained in Figure 5-21. The network NN2 was sensitive
to small variations of the input parameters. The R-value of the output from the
simulated noisy input dropped rapidly, represented by the red line. The next in line, in
terms of the networks sensitivity to the input parameters, is the network 111-M. The
magnitude of the drop in R-value was close to that of NN2. However, the trend was
smoother and less rapid. This is represented by a lower drop ratio of 0.0102.
The sensitivity of the networks ELM-1 and NN1 to the variations of the input
parameters was moderate. The ELM-1 and NN1 generates the drop ratio values of
0.0031 and 0.0060, respectively, Figure 5-22. The drop in the R-value, with the
increase of the input noise percentage, in Figure 5-21 for the two networks is gradual
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 193
but differs in the rate of decay. NN1 represents more exponential decay with higher
decay rate, while ELM-1 had a lower decay rate.
Figure 5-22: Drop ratios for selected artificial neural networks.
Table 5-5 summarizes and tabulates the values of R0, RMIN, ( (%))MINR Noise ,
R∆ and the drop ratios for the five selected networks. In addition, the table provides
the percentage of noise, required to drop each of the networks’ R values to 95%, 90%,
85%, 80% and 75% of the R0 value. The results are extracted from Figure 5-21 and are
found to be in coherence with Figure 5-22. Network NN2 was the most sensitive of all
the networks; the R value dropped to lower than 95 percent of R0 with one percent of
noise added to the input parameters. NET-C was the least sensitive with sixty percent
of noise required to be incorporated to the input data to drop the network performance
to 95 percent of its R0.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 194
Table 5-5: Performance values for the sensitivity analysis of the different selected
networks with the fluctuations of the neural network input parameters.
NN1 NN2 111-M NET-C ELM-1
R0 0.9154 0.9996 0.9993 0.7916 0.9902
Minimum R-value (RMIN) 0.3149 -0.0690 -0.0158 0.6705 0.6789
Corresponding Noise Percentage (%)
MINR (Noise(%)) 100 52 100 100 100
ΔR 0.6005 1.0686 1.0151 0.1212 0.3115 Drop Ratio
ΔRΔNoise(%)
0.0060 0.0206 0.0102 0.0012 0.0031
Percentage of Noise (%)
NN1 NN2 111-M NET-C ELM-1
95% of R0 7 1 3 60 35
90% of R0 10 1 4 82 50
85% of R0 12 1 4 99 62
80% of R0 15 1 5 Greater than 100 74
75% of R0 19 1 5 Greater than 100 85
5.2.5 Summary
The sensitivity of an artificial neural network (ANN) is an important parameter to
study before incorporating the models to any on-line atmospheric plasma spray (APS)
control system. It is important to understand how the designed network model would
respond when conditions stray away from being ideal as the spray process proceeds.
The disturbances for an ANN can occur due to slight fluctuations of the input
parameters presented to the network or due to fluctuations of the network parameters
itself. The hardware implementation of the ANN models is not considered; therefore,
only the disturbances to the input parameters were considered.
Chapter 5: Extreme Learning Machine and Sensitivity Analysis
Tanveer Ahmed Choudhury Page 195
Different ANN models developed in the course of this research work were
considered for this analysis. The network models were trained and optimized under
different conditions, including different network structures, various training algorithms
and different training data sets.
The sensitivity of the selected networks to the fluctuations of the APS input
processing parameters were considered. Uniform distributed noise, generated by
MATLAB’s Simulink tool, was used in this study to simulate the effect of input
parameter disturbances. With the gradual addition of the noise to the input, the
networks were simulated and the correlation coefficient values (R) were computed to
show the changes in network’s performance.
For all the considered networks, the values of correlation coefficient were
reduced with the gradual addition of noise. The networks output became more
scattered with the increase of input data noise. However, there were variations in which
the networks responded. Some of the networks were found to be less responsive to the
noise addition; whereas some networks were sensitive to a small percentage of input
disturbances.
The network NN2, a MLP ANN structure trained with a Bayesian regularization
algorithm and an expanded training set, was the most sensitive to fluctuations of input
parameters. The modular network NET-C, a single hidden layer structure trained with
the original database and the Levenberg-Marquardt algorithm, was the least sensitive
to any changes in the input parameters.
Ranking the networks, in terms of highest sensitivity to the variations of the
input parameters to the lowest sensitivity, we obtain NN2, 111-M, NN1, ELM-1 and
NET-C. The ranking is based on the results obtained in Figure 5-21, Figure 5-22 and
Table 5-5. The study in this section would assist thermal spray engineers in selecting
appropriate artificial neural network models for any specific on-line plasma spray
control process based on individual system requirements.
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 197
Chapter 6 Experimental Work and Network Modelling
This chapter elaborates experiments carried out in relation to the atmospheric
plasma spray (APS) process. The input processing parameters were varied and the
changes in the dynamic behaviour of the in-flight particle characteristics were observed
using a dichromatic sensor. The processing parameters and corresponding in-flight
particle characteristic values were processed to form the experimental database. The
database was then used to train selected artificial neural network (ANN) structures and
models from previous chapters. The developed networks were found to successfully
model the APS process. The networks were able to learn the input / output parameter
relationships and correlate the in-flight particle characteristics with each of the input
processing parameters. The work, thus, provides validation of the proposed ANN
models and structures because the resultant ANNs were found to work both with the
new experimental data and a database from the literature. A flowchart outlining the
work done in this chapter is presented in Figure 6-1.
Figure 6-1: Research methodology for artificial neural network modelling of an
atmospheric plasma spray process with experimental dataset.
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 198
Section 6.1 describes the APS experiment set-up and process parameter
selection. It explains the database collection and processing steps. Section 6.2
introduces the ANN models used in this work to model the APS process. The network
training, optimization and testing steps are illustrated in Section 6.3. Section 6.4
presents the simulation results and provides an analysis and discussion of the
performance of the ANN models on the experimental dataset. A summary of the results
obtained and the work done in the section is presented in Section 6.5.
6.1 Experiment design and plasma spray process set-up
The experiment was set-up to generate a thermal sprayed alumina-titania
(Al2O3-TiO2) coating, which is widely used in various industrial applications. In
comparison to the alumina-titania coatings, the pure alumina (Al2O3) coatings exhibit a
higher degree of hardness, erosion resistance and dielectric strength. However, the
coatings are brittle. A small amount of titania, blended with the alumina feedstock,
increases the toughness of the thermal spray coating. The 13 wt. % or 40 wt. % of
titania into an alumina feedstock generates a coating with higher toughness and lower
hardness, chemical resistance and electric resistivity. The Metco 131VF (Al2O3-
40 wt. % TiO2 -45+5 μm) power was used in the current experiment to form the coating.
The APS samples were created with the help of an industrial partner (United
Surface Technologies Pty. Ltd., Victoria – 3018, Australia). A Plasmadyne SG-100
system (Plasmadyne Corporation, USA) was used. The SG-100 plasma torch uses a
single cathode and anode configuration. The plasma jet is generated by primary
ionising argon gas. The enthalpy of the flame is increased using helium as the
secondary plasma gas. Helium is inferior as a secondary plasma gas in comparison to
an identical volume of hydrogen to improve the plasma jet enthalpy. However, helium
corrodes the electrodes less, thus, extending their lifetime.
The input processing parameters considered in the experiment were the (i) arc
current intensity, (ii) argon primary plasma gas flow rate, (iii) helium secondary plasma
flow rate, and (iv) argon carrier gas flow rate. The powder feed rate was set to 15 and
30 g/min with the substrate stand-off distance being 95 mm. The output in-flight particle
characteristics considered were the average in-flight particle (i) velocity, (ii)
temperature, and (iii) diameter. The input processing parameters were varied and the
corresponding output in-flight dynamic behaviour of the particles was measured using a
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 199
dichromatic sensor (DPV – 2000 from TECNAR Automation Limited, St-Bruno, QC,
Canada J3V 6B5) from the centre of the particle flow stream.
The experimental values appear in Table 6-1 and form the experimental
database, EDSO. The database consisted of 14 data values. The in-flight particle
characteristics represent the average value over a fixed period of measurement from
the DPV-2000 sensor. The standard deviations of each of the in-flight particle velocity,
temperature and diameter, for each run of experiment, are noted in Table 6-2. The
powder feed rate and the substrate stand-off distance for each run of the experiment
are also presented in Table 6-2.
The variations of the input processing parameters in Table 6-1 are presented as
bold numbers. A single input processing parameter was varied at any time (Run 1 to
Run 13). The remaining parameters were fixed at their reference values. Run 14
consist of all the input processing parameters kept to their reference values. The
reference values of the input processing parameters are noted in the footnote of Table
6-1.
Chapter 3, Chapter 4 and Chapter 5 used the database DSO (Table 3-1) from
literature [40] for ANN modelling of the APS process. A critical difference between DSO
and the experimental, EDSO, is in the number of input parameters considered. The
database, DSO, from the literature used six input parameters: (i) arc current intensity,
(ii) primary plasma gas flow rate, (iii) secondary plasma flow rate (iv) carrier gas flow
rate, (v) injector diameter, and (vi) injector stand-off distance. The experimental
database, EDSO, however, considered only the first four input processing parameters
for variation. The reasoning is as follows.
The work described in the literature [40] employed a Sulzer-Metco F4 gun
(Wohlen, Switzerland) for spraying. The experiment carried out in this study, however,
used a different SG-100 torch. A key difference between the F4 plasma torch and the
SG-100 torch lies in the location of the powder port and the angle of powder injection.
In the SG-100 plasma torch, the powder port is incorporated into the anode assembly.
The angle of powder feed injection can be selected using different models of anode.
The angle of powder feed helps in the flow of powder particles into the plasma plume.
The experiment in this study deploys an anode (175 model) with an internal powder
feed at a 90° injection from horizontal axis. The use of SG-100 torch and experimental
setup fixed the parameters of injector diameter and injector stand-off distance. The
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 200
experiment, therefore, considered and varied only the other four input processing
parameters.
Table 6-1: Experimental database (EDSO) consisting of the atmospheric plasma spray
input processing parameters and the output in-flight particle characteristics.
Run I [A]
ArV[l/min]
HeV[l/min]
CGV[l/min]
Experimental Values
V [m/s]
T [°C]
D [μm]
1 550 47.2 27.9 5.7 182 2355 23
2 650 47.2 27.9 5.7 194 2412 23
3 750 47.2 27.9 5.7 191 2450 21
4 650 40.1 27.9 5.7 170 2394 23
5 650 54.3 27.9 5.7 201 2381 21
6 650 47.2 24.1 5.7 185 2400 23
7 650 47.2 31.6 5.7 192 2416 22
8 650 47.2 27.9 5.2 213 2215 25
9 650 47.2 27.9 6.1 187 2182 26
10 650 47.2 27.9 6.1 197 2219 26
11 650 47.2 27.9 7.1 185 2375 21
12 650 47.2 27.9 7.1 219 2270 26
13 650 47.2 27.9 8.5 184 2360 20
14 650 47.2 27.9 5.7 201 2207 26
I Current Intensity (Reference value: 650 A)
ArV Argon primary plasma gas flow rate (Reference value: 47.2 l/min)
HeV Helium secondary plasma gas flow rate (Reference value: 27.9 l/min)
CGV Argon carrier gas flow rate (Reference value: 5.7 l/min)
V Average in-flight particle velocity
T Average in-flight particle temperature
D Average in-flight particle diameter
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 201
Table 6-2: Atmospheric plasma spray process experiment parameters. The standard
deviations of the measured in-flight particle characteristics are indicated.
Run Feed Rate [g/min]
Substrate Stand-off Distance
SOD [mm]
Standard Deviations
Particle Velocity V [m/s]
Particle Temperature
T [°C]
Particle Diameter D [μm]
1 15 95 50 270 8
2 15 95 57 259 9
3 15 95 65 292 9
4 15 95 53 255 9
5 15 95 59 271 8
6 15 95 55 245 9
7 15 95 60 279 9
8 30 95 50 167 9
9 30 95 46 162 8
10 30 95 46 147 8
11 15 95 56 244 8
12 30 95 48 166 9
13 15 95 54 246 7
14 30 95 46 149 8
6.2 Artificial neural network modelling
The three artificial neural network models proposed and used in Chapter 3 and
Chapter 4 were used in this section for modelling the atmospheric plasma spray
process to predict the in-flight particle characteristics from the input processing
parameters. The experiment database, EDSO, with the error back-propagation
algorithm, was used for all the networks training and testing. A description of the
network structures is discussed in the following paragraphs.
The first ANN model is the two hidden layer multi-layer perceptron (MLP)
structure used in Chapter 3. For reference, all the networks trained with this structure,
in this section, are referred to as N1.
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 202
The MLP structure used in Chapter 3 is presented in Figure 3-2. The chapter
used the database DSO (Table 3-1) from literature [40] for ANN modelling. As illustrated
in Section 6.1; the experimental database, EDSO, used in this chapter is different in
terms of the number of input processing parameters considered. EDSO contains only
four input processing parameters. Modification of the input layer of the MLP structure in
Figure 3-2 was, thus, necessary. The updated MLP architecture is presented in Figure
6-2.
Figure 6-2: Block diagram of the designed multi-layer artificial neural network (ANN)
structure.
As presented in Figure 6-2, the MLP architecture consists of three types of
layers; i.e., input layer, hidden layers and the output layer. The input layer is connected
to the input processing parameters and the output layer generates the network output,
which are the in-flight particle characteristics consisting of average particle velocity,
temperature and diameter. The layers in between the input and output layers are
named as the hidden layers. The hidden layer contains neurons to help the network
learn the input-output parameter relationships. The number of hidden layers is a
variable parameter. In correlating with the work in Chapter 3, the number of hidden
layers in this study was fixed to two, Figure 6-2.
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 203
In Figure 6-2, jiw (where i = 1…4 and j = 1…N1) represents the input layer
weights. The terms jiα (where i = 1… N1 and j = 1… N2) and jiβ (where i = 1…N2 and
j=1…3) represent the hidden layer weights and output layer weights, respectively. N1
and N2 represent the number of neurons in hidden layer 1 and hidden layer 2;
respectively.
No generalized rule exists to specify the exact values of N1 and N2. A large
number of hidden layer neurons provide the network flexibility to optimize many
parameters and reach an improved solution. However, increasing the size of the
hidden layer over a certain limit makes the network under-characterized. The network
in such cases is forced to optimize more parameters than the data vectors available to
define these parameters. Too few a number of neurons in the hidden layers leads to
under-fitting. The performance of a trained ANN is sensitive to the number of hidden
layer neurons and the optimum number and combination of neurons in the hidden
layers are determined from the network training and optimization process.
The second model, used in this work, is based on the modified MLP ANN
architecture ‘111’ proposed and used in Section 4.1 of Chapter 4. These trained and
optimized networks are referred to as N2 in this section.
The default MLP ANN structure, presented in Figure 6-2, with two hidden
layers, consists of the input layer connected to the 1st hidden layer, which is connected
to the 2nd hidden layer. The 2nd hidden layer is then connected to the output layer. A
block diagram illustrating the structure is presented in Figure 4-1.
The network, with the proposed modified structure, is provided with additional
parameters to learn and generalize the process relationships without increasing the
number of hidden layer neurons. This is facilitated by modification of the layer
connection matrix. Additional connections were made from the input layer to the 2nd
hidden layer and also to the output layer. The MLP structure presented in Figure 6-2 is,
thus, modified as per the block diagram of the proposed structure in Figure 4-2.
The third and final network model used the modular ANN method implemented
in Section 4.2 of Chapter 4. The modular implementation allows simplification of the
designed ANN model in predicting the in-flight particle characteristics from the input
processing parameters of the APS process. The APS is first decomposed into sub-
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 204
processes to simplify the problem the network is required to learn. Each sub-process is
a part of the whole APS process and is assigned a different ANN. Thus, each designed
ANN focuses on solving only a sub-process. The final solution is obtained by re-
combining the individual network solutions.
Decomposition of the task allowed simpler ANNs to be built and at the same
time helped the networks to learn the process better. The segmented approach allows
the user to understand the relationships that the model established between each of
the in-flight particle characteristics and the input processing parameters. The system
reliability is enhanced by splitting up the problem so that each network is trained to
solve a part of the whole problem. Any fault or error in prediction of one of the sub-
problems does not affect the entire solution to the problem.
With the existing knowledge and understanding of the APS process, explicit
decomposition was chosen for decomposition of the task into modular components.
The overall task in this work concentrated on predicting the three in-flight particle
characteristics (i.e., in-flight particle velocity, temperature and diameter) from the input
processing parameters of the APS process. The task was decomposed into three sub-
tasks, each considering the effects of input processing parameters on one of the in-
flight particle characteristics.
In this study, all the three output parameters were of equal importance. Co-
operative combination was used for the three modular components. The three outputs
from three modular components, each providing a solution to the sub-task assigned,
were combined with equal weighting to generate the final solution. A flowchart
illustrating the co-operative combination process, used in this study, is presented in
Figure 4-14.
All the APS input processing parameters in EDSO were fed into the input layer
of the networks. The first network (N3-V) generated the in-flight particle velocity at the
output layer, the second network (N3-T) generated the in-flight particle temperature
and the third network (N3-D) generated the in-flight particle temperature as the output.
Figure 6-3 provides a flowchart for the modular implementation of the APS process
used in this section.
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 205
Figure 6-3: Flowchart for modular artificial neural network implementation of the
atmospheric plasma spray process.
The networks N3-V, N3-T and N3-D are based on single hidden layer fully
connected MLP model. The network architecture is presented in Figure 6-4. The single
hidden layer proved sufficient for the networks to learn the function defining the sub-
tasks assigned. The parameter jiw (where i = 1…4 and j = 1…N1) represents the
input layer weights. jiβ (where i = 1… N1 and j = 1) the output layer weights. The
parameter N1 defines the number of hidden layer neurons. The value is obtained
through network training and the optimization process.
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 206
Figure 6-4: Single layer multi-layer perceptron (MLP) artificial neural network (ANN)
architecture.
For the purpose of training and testing the developed modular ANNs, the
database, EDSO (Table 6-1) was split into three sub-sets. The segmentation was based
on the three output parameters. Figure 6-5 provides a flow chart of the data split
process. The first subset, EDSO1, contained the input processing parameters and the
average in-flight particle velocity. The second subset, EDSO2, contained the input
processing parameters and the average in-flight particle temperature and the third
subset, EDSO3, contained the input processing parameters and the average in-flight
particle diameter.
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 207
Figure 6-5: Flowchart representing the data split process for training of developed
modular artificial neural network models.
The flowchart depicting the overall research methodology presented in this
section is provided in Figure 6-6. The database, EDSO, was used for network training
and optimization of the networks, N1 and N2. The EDSO was split into EDSO1, EDSO2
and EDSO3 for training and testing of the modular ANNs N3-V, N3-T and N3-D. The
network outputs and performance of all the different networks were compared and
analysed.
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 208
Figure 6-6: Research methodology for artificial neural network implementation of the
atmospheric plasma spray process to predict the output average of in-flight particle
characteristics using different artificial neural network models and structures.
6.3 Network training and optimization
The study considered supervised learning based on the BP algorithm. The
network size in this study is within a few hundred weights; therefore a non-linear least
squares numerical optimization method of the Levenberg-Marquardt (LM) algorithm
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 209
(Section 2.2.2.2) was used for training all the considered networks N1, N2, N3-V, N3-T
and N3-D. The LM algorithm is considered more efficient in training than the conjugate
gradient method or the variable learning rate algorithm for a network with a few
hundred weights [100]. Other standard back propagation algorithms are slow and
require excessive off-line training. They also suffer from temporal instability and tend to
become fixed to the local minima [157].
The LM algorithm uses a standard statistical technique of cross-validation and
early stopping to combat the problem of over-fitting. The technique requires a separate
validation set to test whether the network has started to over-fit during the training. The
validation set is unseen to the network during its training. The error generated by the
networks on the validation sets provides a measure of the over-fitting. The network
training is stopped if the validation error starts to increase as the network is most likely
to have started over-fitting.
For the network N1 and N2, EDSO was split by interleaving in the ratio 0.80:0.20
to generate the training/validation dataset, EDSOTRV and test dataset EDSOT. The EDSO
contains 14 data values. The EDSOTRV and EDSOT, thus, contained 11 and 4 data
values, respectively. The EDSOTRV was used for the network training and validation
purpose while EDSOT was used only for network testing purposes. The dataset
EDSOTRV was further interleaved in the ratio 0.80:0.20 to generate the training dataset
EDSOTR (9 data values) and the validation dataset EDSOTV (2 data values).
For the modular networks, N3-V, N3-T and N3-D, the split databases EDSO1,
EDSO2 and EDSO3 were also interleaved in the ratio 0.80 to 0.20 to generate the
corresponding training/validation datasets (EDSO1TRV, EDSO2TRV and EDSO3TRV) and test
datasets (EDSO1T, EDSO2T and EDSO3T). The training/validation datasets were
interleaved in the same ratio of 0.80:0.20 to produce the individual training datasets
(EDSO1TR, EDSO2TR and EDSO3TR) and validation datasets (EDSO1V, EDSO2V and
EDSO3V). The entire data division processes are pictured in Figure 6-7. The number of
data values, for each dataset, is also noted.
The selected ratio 0.80:0.20 provides the optimal size of the training dataset for
the available database. A smaller training dataset would have reduced the networks
ability to learn the process dynamics. Increasing the number of training data points
would have reduced the size of the validation and test sets. The reduction in validation
set would have reduced the ability of the network to detect any over-fitting occurring
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 210
during network training. Reduction in the size of the test set would have reduced the
test sets ability to test the trained networks generalization performance.
Figure 6-7: Data division process of the experimental database of the atmospheric
plasma spray process for training and testing of the different designed artificial neural
network models.
The LM is a fast algorithm in terms of the training speed. The training
parameters of the algorithm, when used together with early stopping, should be
adjusted to reduce the convergence speed. The parameter μ (Equation 2-44) was,
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 211
thus, set to a relatively large value of 1. The μ increment parameter was set to 1.5,
while the μ decrement factor was set to 0.8. The number of training epochs was set to
300 and the maximum number of permitted validation error fails to 100. These limits
ensured that the network was allowed sufficient iterations to converge to the functions
global error minimum. The transfer function used in all layers was the log-sigmoid
function and the error performance function was set to the mean absolute error (MAE)
(Equation 3-7).
The training conditions were applied in training all the selected networks N1,
N2, N3-V, N3-T and N3-D with the Levenberg-Marquardt algorithm. In each case, the
network parameters were initialized to random values between 0 and 1.
The networks N1 and N2 were trained with the training dataset EDSOTR. The
trained networks were simulated with the test dataset, EDSOT. The computed R-values
on the test set indicated how well the trained networks responded to the unseen input
fits to their respective outputs. It provided a measure of the networks generalization
ability. The larger the average R-value, then better was the generalization performance
of the network and correlation between the predicted and actual values. The network
training was repeated one hundred times and the network generating the maximum R-
value of EDSOT was stored and saved. The training process was repeated as the
combination of the number of neurons in the 1st and 2nd hidden layer was varied from 2
and 1 to 20 and 19, respectively, with increments of one neuron in each hidden layer.
The modular networks N3-V, N3-T and N3-D were trained similarly with their
corresponding training datasets EDSO1TR, EDSO1TR and EDSO1TR, respectively. The
network training was repeated one hundred times. The networks generating the
maximum R-value on their respective test datasets were stored and saved each time.
The process was repeated with the number of hidden layer neurons varied from 1 to
20, with increments of one neuron.
6.4 Simulation result
6.4.1 Proposed network models
This sub-section elaborates the generalization performance of the trained
networks N1, N2, N3-V, N3-T and N3-D. The generalization performance includes the
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 212
correlation coefficient, R, and generalization error values. The generalization error
represents the error generated by each network on their respective test dataset. As the
test set was unseen to the networks during the training process, the error generated
and correlation coefficient values generated by each of the networks provides an
understanding of the generalization performance of the networks.
Figure 6-8 plots the generalization performance of the networks N1. The figure
presents a bar chart comparison of R-values and generalization errors of all the
networks in N1 having different combinations of the number of neurons in the hidden
layers. The average R-value was 0.7756 with a maximum value of 0.9521 for a total of
only 9 neurons in the two hidden layers (5 and 4 neurons in the 1st and 2nd hidden layer
respectively). The average generalization error, of all the networks, was in the order of
0.1758. The network with 5 and 4 neurons, in the 1st and 2nd hidden layers respectively,
is marked as ‘N1-M’. This network was found to generate the maximum correlation (an
R-value of 0.9521) between the predicted and actual outputs, when simulated with the
test set.
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 213
Figure 6-8: Generalization performances of all the artificial neural networks N1 with
different combination of the number of hidden layer neurons.
The network N2 generated an average correlation coefficient of 0.6971. The
average generalization error of all the networks was 0.2238. A bar chart comparison of
all the R-values and generalization errors of the networks in N2, with different
combinations of the number of neurons in the hidden layers, is presented in Figure 6-9.
The network with 19 and 18 neurons, in the 1st and 2nd hidden layers respectively,
Referenced as “N1-M” in the text
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 214
generated the maximum correlation of 0.8137 on the test set. The network is named as
‘N2-M’.
Figure 6-9: Generalization performances of all the artificial neural networks N2 with
different combination of the number of hidden layer neurons.
Figure 6-10 provides a bar chart comparison of the R and generalization error
values of the designed modular ANN N3-V, trained with a different number of neurons
Referenced as “N2-M” in the text
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 215
in the hidden layer. The average R-value, over all values of the hidden layers, was
0.9979 with the corresponding average generalization error of 0.1225. N3-V generated
the highest R-value of 0.9999 with 12 hidden layer neurons. The corresponding
generalization error was 0.1239. This network is marked as ‘N3-V-M’.
Figure 6-10: Generalization performances of the modular artificial neural network N3-V
with different combination of the number of hidden layer neurons.
Referenced as “N3-V-M” in the text
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 216
Figure 6-11 provides a bar chart comparison of the generalization performances
of N3-T trained with a different number of neurons in the hidden layer. The average R
and generalization error values were computed to be 0.9283 and 0.1792, respectively.
For N3-T, the network with 8 hidden layer neurons generated the best performance in
terms of R-value over all the neuron numbers. The maximum R-value generated was
0.9999 with a corresponding generalization error of 0.1484. This network is referred to
as ‘N3-T-M’.
Figure 6-11: Generalization performances of the modular artificial neural network N3-T
with different combination of the number of hidden layer neurons.
Referenced as “N3-T-M” in the text
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 217
Figure 6-12 provides a bar chart comparison of the correlation coefficient (R)
and generalization error values for N3-D trained with a different number of neurons in
the hidden layer. The average R-value and generalization error, over all the networks
trained, was 0.9897 and 0.1403, respectively. Among all networks in N3-D, the
network with just 6 hidden layer neurons generated the best generalization
performance with a maximum R value of 0.9999 and corresponding generalization
error of 0.1285. The network is marked as ‘N3-D-M’.
Figure 6-12: Generalization performances of the modular artificial neural network N3-D
with different combination of the number of hidden layer neurons.
Referenced as “N3-D-M” in the text
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 218
6.4.2 Performance comparison and result analysis
The simulation results of the five selected networks N1, N2, N3-V, N3-T and
N3-D are compared and analysed further for better understanding the performance of
the proposed ANN models and structures in modelling the APS process.
The average generalization performances (average correlation coefficient (R)
values and average generalization errors) of all the selected networks are plotted as
bar graphs in Figure 6-13. The modular networks N3-V, N3-T and N3-D perform better
in comparison to the networks N1 and N2. The training dataset in this study was very
small. The dataset contained only 9 data values. In this case the modular methodology
allowed the networks to learn the APS input / output process relationships better. The
methodology simplified the problem by splitting the process and assigning separate
small single hidden layer ANNs to each sub-process.
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 219
Figure 6-13: Average generalization performance comparison of different artificial
neural network models.
The generalization performances of the best performing networks marked in
Section 6.4.1 are compared and analysed in the following paragraphs. The marked
networks are N1-M, N2-M, N3-V-M, N3-T-M and N3-D-M.
A bar chart comparison of the correlation coefficient and generalization error
values of all the selected networks is presented in Figure 6-14. Correlating the findings
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 220
in Figure 6-13, the modular networks N3-V-M, N3-T-M and N3-D-M performed better
than N1-M and N2-M. The selected modular networks exhibited higher overall R-values
and lower overall generalization errors.
Figure 6-14: Generalization performance comparison of the various selected best
performing artificial neural network models.
For further analysis of the models, each of the selected networks N1-M, N2-M,
N3-V-M, N3-T-M and N3-D-M are simulated with the whole experimental database,
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 221
EDSO. For the modular networks N3-V-M, N3-T-M and N3-D-M, the outputs are
combined to generate the combined output in-flight particle characteristics. The
combined output is named as ‘N3-C’. The combined modular networks output N3-C are
used further in this section. It replaces the performance of individual modular networks
N3-V, N3-T and N3-D.
Table 6-3 tabulates all the experimental in-flight particle characteristics from
EDSO along with the corresponding predicted in-flight particle characteristics values
from the networks N1-M, N2-M and N3-C.
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 222
Table 6-3: The experimental in-flight particle characteristics values from the
experimental database EDSO with the corresponding predicted values from the
developed artificial neural network models.
Run In-flight particle velocity V [m/s]
In-flight particle
temperature T [°C]
In-flight particle
diameter D [μm]
1
Experimental values 182 2,355 23
Predicted values
N1-M 181.22 2,365.37 23.76 N2-M 196.40 2,343.81 25.29 N3-C 182.50 2,341.55 22.55
2
Experimental values 194 2,412 23
Predicted values
N1-M 197.50 2,309.67 24.51 N2-M 197.51 2,310.81 24.64 N3-C 198.91 2,308.03 24.93
3
Experimental values 191 2,450 21
Predicted values
N1-M 190.97 2,449.99 21.00 N2-M 191.05 2,445.01 21.02 N3-C 190.96 2,447.06 21.02
4
Experimental values 170 2,394 23
Predicted values
N1-M 170.01 2,393.68 22.99 N2-M 170.00 2,393.85 23.01 N3-C 170.17 2,394.09 23.03
5
Experimental values 201 2,381 21
Predicted values
N1-M 200.98 2,380.94 21.00 N2-M 200.99 2,380.75 21.01 N3-C 200.96 2,380.95 21.02
6
Experimental values 185 2,400 23
Predicted values
N1-M 185.00 2,400.04 23.00 N2-M 185.08 2,400.76 23.02 N3-C 185.18 2,400.32 23.02
7
Experimental values 192 2,416 22
Predicted values
N1-M 181.27 2,373.76 23.49 N2-M 183.21 2,391.16 24.99 N3-C 193.42 2,405.68 21.42
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 223
Table 6-3: The experimental in-flight particle characteristics values from the
experimental database EDSO with the corresponding predicted values from the
developed artificial neural network models (Continued).
Run In-flight particle velocity V [m/s]
In-flight particle
temperature T [C]
In-flight particle
diameter D [m]
8
Experimental values 213 2,215 25
Predicted values
N1-M 212.97 2,214.96 24.99 N2-M 212.91 2,214.82 24.90 N3-C 211.31 2,215.79 24.74
9
Experimental values 187 2,182 26
Predicted values
N1-M 196.97 2,219.21 25.97 N2-M 196.93 2,218.40 25.52 N3-C 195.21 2,221.50 25.22
10
Experimental values 197 2,219 26
Predicted values
N1-M 196.97 2,219.21 25.97 N2-M 196.93 2,218.40 25.52 N3-C 195.21 2,221.50 25.22
11
Experimental values 185 2,375 21
Predicted values
N1-M 201.99 2,322.55 23.50 N2-M 201.95 2,323.39 23.54 N3-C 202.29 2,322.50 23.55
12
Experimental values 219 2,270 26
Predicted values
N1-M 201.99 2,322.55 23.50 N2-M 201.95 2,323.39 23.54 N3-C 202.29 2,322.50 23.55
13
Experimental values 184 2,360 20
Predicted values
N1-M 184.01 2,359.99 20.00 N2-M 184.20 2,360.83 20.08 N3-C 183.82 2,360.15 19.97
14
Experimental values 201 2,207 26
Predicted values
N1-M 197.50 2,309.67 24.51 N2-M 197.51 2,310.81 24.64 N3-C 198.91 2,308.03 24.93
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 224
The simulated output in-flight particle characteristics from N1-M, N2-M and N3-
C are compared with the corresponding expected outputs from EDSO. The values of
correlation coefficient (R) and generalization error are, thus, computed for each of the
networks. Figure 6-15 presents a bar chart comparison of the R and generalization
error values for the networks N1-M, N2-M and N3-C. The generalization performance
of the combined modular networks was found to be better in comparison to that of
multi-layer ANNs N1-M and N2-M. Among the selected networks, N3-C generated the
maximum R-value of 0.8428 with a corresponding minimum generalization error of
0.1041. The result finding is in coherence with those obtained in Figure 6-13 and
Figure 6-14.
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 225
Figure 6-15: Generalization performance of the selected artificial neural network
models on the entire experimental database EDSO.
Figure 6-15 provided the generalization performance of the networks
considering all of the in-flight particle characteristics in EDSO. Further analysis is
performed below to observe the performance of each of the networks in predicting
individual in-flight particle characteristics.
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 226
The prediction errors of the networks N1-M, N2-M and N3-C, in predicting the
individual in-flight particle velocity, temperature and diameter, is computed by
subtracting the predicted values from the experimental values in EDSO. The standard
deviation of the individual particle characteristics for each run of the experiment is also
listed. Table 6-4, Table 6-5 and Table 6-6 list the values of prediction error and
standard deviations for networks N1-M, N2-M and N3-C, respectively. In all cases it
was found that the prediction error was smaller than the corresponding values of
standard deviations. This validates the fact that the generated network error was within
the range of experimental error.
Table 6-4: Standard deviations of the experimental in-flight particle characteristics of an
atmospheric plasma spray process along with prediction error by the selected artificial
neural network N1-M.
Network N1-M
Run Prediction
Error Velocity V [m/s]
Standard Deviation Velocity V [m/s]
Prediction Error
Temperature T [°C]
Standard Deviation
Temperature T [°C]
Prediction Error
Diameter D [μm]
Standard Deviation Diameter D [μm]
1 -0.78 50 10.37 270 0.76 8 2 3.50 57 -102.33 259 1.51 9 3 -0.03 65 -0.01 292 0.00 9 4 0.01 53 -0.32 255 -0.01 9 5 -0.02 59 -0.06 271 0.00 8 6 0.00 55 0.04 245 0.00 9 7 -10.73 60 -42.24 279 1.49 9 8 -0.03 50 -0.04 167 -0.01 9 9 9.97 46 37.21 162 -0.03 8
10 -0.03 46 0.21 147 -0.03 8 11 16.99 56 -52.45 244 2.50 8 12 -17.01 48 52.55 166 -2.50 9 13 0.01 54 -0.01 246 0.00 7 14 -3.50 46 102.67 149 -1.49 8
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 227
Table 6-5: Standard deviations of the experimental in-flight particle characteristics of an
atmospheric plasma spray process along with prediction error by the selected artificial
neural network N2-M.
Network N2-M
Run Prediction
Error Velocity V [m/s]
Standard Deviation Velocity V [m/s]
Prediction Error
Temperature T [°C]
Standard Deviation
Temperature T [°C]
Prediction Error
Diameter D [μm]
Standard Deviation Diameter D [μm]
1 14.40 50 -11.19 270 2.29 8 2 3.51 57 -101.19 259 1.64 9 3 0.05 65 -4.99 292 0.02 9 4 0.00 53 -0.15 255 0.01 9 5 -0.01 59 -0.25 271 0.01 8 6 0.08 55 0.76 245 0.02 9 7 -8.79 60 -24.84 279 2.99 9 8 -0.09 50 -0.18 167 -0.10 9 9 9.93 46 36.40 162 -0.48 8
10 -0.07 46 -0.60 147 -0.48 8 11 16.95 56 -51.61 244 2.54 8 12 -17.05 48 53.39 166 -2.46 9 13 0.20 54 0.83 246 0.08 7 14 -3.49 46 103.81 149 -1.36 8
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 228
Table 6-6: Standard deviations of the experimental in-flight particle characteristics of an
atmospheric plasma spray process along with prediction error by the selected artificial
neural network N3-C.
Network N3-C
Run Prediction
Error Velocity V [m/s]
Standard Deviation Velocity V [m/s]
Prediction Error
Temperature T [°C]
Standard Deviation
Temperature T [°C]
Prediction Error
Diameter D [μm]
Standard Deviation Diameter D [μm]
1 0.50 50 -13.45 270 -0.45 8 2 4.91 57 -103.97 259 1.93 9 3 -0.04 65 -2.94 292 0.02 9 4 0.17 53 0.09 255 0.03 9 5 -0.04 59 -0.05 271 0.02 8 6 0.18 55 0.32 245 0.02 9 7 1.42 60 -10.32 279 -0.58 9 8 -1.69 50 0.79 167 -0.26 9 9 8.21 46 39.50 162 -0.78 8
10 -1.79 46 2.50 147 -0.78 8 11 17.29 56 -52.50 244 2.55 8 12 -16.71 48 52.50 166 -2.45 9 13 -0.18 54 0.15 246 -0.03 7 14 -2.09 46 101.03 149 -1.07 8
The absolute average relative error percentage (with respect to the
experimental values) was computed for each of the predicted in-flight particle
characteristics by the networks N1-M, N2-M and N3-C; Table 6-7. The error
percentage was computed for variations of individual input processing parameters as
well as variations of all the parameters. This provides an understanding of how well the
networks were able to correlate the in-flight particle characteristics with the input
parameters.
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 229
Table 6-7: Absolute average relative error percentage of the predicted in-flight particle
characteristics by different artificial neural network models with the variations of
atmospheric plasma spray input processing parameters.
Input processing parameters
Absolute average relative error percentage (%) *
In-flight particle Velocity, V
In-flight particle Temperature, T
In-flight particle Diameter, D
N1-M N2-M N3-C N1-M N2-M N3-C N1-M N2-M N3-C
All Parameters 2.28 2.75 2.00 1.24 1.21 1.18 3.16 4.44 3.30
Current intensity 0.75 3.25 0.94 1.56 1.62 1.67 3.29 5.73 3.48
Argon plasma
gas 0.01 0.00 0.06 0.01 0.01 0.00 0.01 0.05 0.11
Helium plasma
gas 2.80 2.31 0.42 0.88 0.53 0.22 3.39 6.85 1.37
Total plasma
gas 1.40 1.16 0.24 0.44 0.27 0.11 1.70 3.45 0.74
Argon carrier
gas 3.72 3.74 3.86 1.04 1.04 1.08 3.63 4.34 4.79
* Absolute average relative error percentage of the predicted values with respect to the experimental values.
Figure 6-16 plots a bar graph comparison for the absolute average relative
percentage error of the networks N1-M, N2-M and N3-C. The errors were computed for
the individual predicted in-flight particle characteristics with the variations of all the APS
input processing parameters. These values are presented in the first row of Table 6-7.
The modular network N3-C was found to generate the lowest error in predicting the in-
flight particle velocity and temperature. For the prediction of in-flight particle diameter,
the network N1-M performed slightly better than N3-C.
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 230
Figure 6-16: Absolute average relative percentage errors of different selected artificial
neural network models in predicting the in-flight particle characteristics of an
atmospheric plasma spray process from the input processing parameters.
6.5 Summary
The artificial neural network models, proposed and used in Chapter 3 and
Chapter 4, were used in this section for modelling the atmospheric plasma spray
process to predict the in-flight particle characteristics from the input processing
parameters. The experiment database, EDSO, with the Levenberg-Marquardt back-
propagation algorithm, was used for all the network training and testing purpose. The
trained networks were found to handle the APS process dynamics and learn the
underlying input / output relationships. This was demonstrated as the models predicted
the in-flight particle characteristics from the input processing parameters with good
generalization ability.
In the current work, the modular networks performed better in comparison to the
two standard ANNs used. The training dataset in this study contained 9 data value. In
this case the modular method simplified the problem by splitting the process into sub-
Chapter 6: Experimental Work and Network Modelling
Tanveer Ahmed Choudhury Page 231
processes and assigning a separate small single hidden layer ANN to each sub-
process. The simplification of the task allowed the networks to learn the APS input /
output process relationships more efficiently.
The good generalization performance of the different network models on the
experiment database presents a validation of the proposed models and structures. The
different models were previously found, in their respective chapters, to work well with
the database from the literature. It can, thus, be said the applicability of the developed
ANN models are not limited to a single case. The networks can be re-trained and
optimized to be used for a range of different cases and environments. The ANN-based
models, within limits of its training data and the input processing parameters
considered can be incorporated to an on-line plasma spray control system to allow the
automated system achieve the desired process stability.
Chapter 7: Conclusion and Future Work
Tanveer Ahmed Choudhury Page 233
Chapter 7 Conclusion and Future Work
7.1 Conclusion
The atmospheric plasma spray (APS) process is a highly variable and versatile
process in terms of the input and output relationships. The in-flight particle
characteristics define and control the coating and its structure. Accurate predictions of
such parameters are important and assist thermal spray engineers in reducing time
and the complexities related to the pre-spray tuning and parameter setting.
The artificial neural network (ANN) method has been employed to study and
design the APS process to predict the output in-flight particle characteristics from the
input power and injection parameters. This facilitates the experimental design and data
manipulation of the APS process and helps in understanding the correlations between
the output and input parameters. The ANN based model, within the limits of its training
data and the input processing parameters considered, is suitable to be incorporated to
an on-line plasma spray control system to allow the automated system achieve the
desired process stability.
The general multi-layer perceptron (MLP) ANN structure with error back-
propagation (BP) algorithms successfully modelled the APS process. The trained ANN
models are sensitive to the training data set and the validity of the output is limited to
input processing parameters considered in this study. The use of a regularization
technique over cross-validation and early stopping was able to overcome the problem
associated with over-fitting of ANN during the training process. The generalization
ability of the trained ANN, in predicting the in-flight particle characteristics, was
improved.
There were a considerable scatter of the experimental values of the particle
velocity, temperature and diameter in the database obtained from the literature [40].
However, the ANN predicted outputs were found to be pretty much in agreement with
the experimental database from which the networks were trained and optimized. The
proposed MLP ANN structures successfully handled the non-linearity and versatility
associated with the plasma spray process.
The Levenberg-Marquardt and Bayesian regularization algorithms used in this
study successfully trained and optimized the multi-layer neural network structure with
Chapter 7: Conclusion and Future Work
Tanveer Ahmed Choudhury Page 234
the optimal number of hidden layer neurons. The trained networks were able to
correlate the effect of each processing parameter to each of the in-flight particle
characteristics. This provides the required in-flight particle characteristics for the
desired coating properties.
The ANN training becomes difficult with the presence of a small database. The
database, obtained from the literature, contained only 16 data values. The database
was expanded for enhanced network training. Kernel regression was used for the
expansion of the database to approximately nineteen times. Database expansion had a
large impact into the ANNs training performance and improved the networks
generalization capability. The additional use of regularization in training the networks
resulted in fewer use of the network parameters. This increased the level of network
parameter scattering. However, the generalization performance greatly improved in
comparison to cross-validation and early stopping.
The major technical challenge with the general MLP ANN structure was to
optimize the number of neurons in the hidden layers. The number of neurons needs to
be increased to provide the network with additional parameters that enhance the
optimization computations. However, increasing the neurons has the effect of under-
characterizing the network. This creates a more complex network that leads to over-
fitting.
An optimized MLP ANN network structure was proposed in this study and
overcame such problems. The network was provided with additional parameters to
learn and generalize the process relationships without increasing the number of hidden
layer neurons. This was facilitated by modification of the layer connection matrix. The
structure resulted in (i) improvement of the training performance; (ii) regularization of
the training curve to monotonically move towards the global minimum, and (iii)
reduction in the levels of fluctuation of the training performance curve. The simulation
results and analysis illustrated that the generalization performance of the trained
networks were successful in modelling the APS process to predict the in-flight particle
characteristics from the input processing parameters.
The introduction of modular ANN methodology in modelling the APS process
was successful and performed better in terms of individually correlating each of the
output parameters with the input processing parameters. Breakdown of the task into
Chapter 7: Conclusion and Future Work
Tanveer Ahmed Choudhury Page 235
sub-tasks, and allocation of a separate ANN to concentrate on a single sub-task,
simplified the problem for the ANN to solve.
Each network was allocated only a sub-task; thereby allowing each of the
networks to comprehend the underlying input / output parameter relationships with a
relatively smaller number of hidden layer neurons. The use of a single hidden layer
reduced the total number of network parameters. Regularization in network training
further reduced the number of active network parameters.
The reduced number of parameters, available for network training and
optimization, reduced the level of fluctuations. The optimum training condition was
achieved with a smaller range of values of the training parameters. Furthermore, the
training process was more stable than general MLP ANN structures, with the response
of the networks to the changes in the number of hidden layer neurons following a
definite trend. These results relate to the overall stability and robustness of the trained
networks.
There are quite a few drawbacks of the commonly used error back propagation
algorithms. A crucial one is the network learning speed, which is far slower than
desired. It becomes unsuitable to be incorporated to any real time system or to an on-
line thermal spray control system along with a diagnostic tool to allow the automated
system achieve the desired process stability.
The ELM algorithm together with a single hidden layer feed forward neural
network (SLFN) structure was successful in modelling the APS process and
comprehend the process dynamics. The ELM algorithm showed better performance
than most of the standard back propagation algorithms used to train multi-layer feed
forward networks. Simulation results confirm the better performance of ELM both in
terms of good generalization ability and shorter training times.
The generalization performance of the networks trained with the ELM algorithm,
with different combinations of the number of hidden layer neurons, was more stable
than the corresponding networks trained with the standard BP algorithms. The features
depicted stability and robustness of the ELM algorithm network learning process. The
network stability, robustness and significantly reduced training times makes the ELM
algorithm a desirable candidate to be incorporated into an on-line plasma spray control
system. Such a system would benefit the plasma spray manufacturing process and
Chapter 7: Conclusion and Future Work
Tanveer Ahmed Choudhury Page 236
assist the spray engineers in reducing the time and complexities associated with spray
tuning and setting the crucial thermal spray parameters.
The sensitivity of the ANN is an important parameter to study before
incorporating the models to any on-line APS control system. It is of great importance to
understand how the designed network model would respond when conditions stray
away from being ideal. The disturbances for an ANN can occur due to slight
fluctuations of the input parameters presented to the network or due to fluctuations of
the network parameters. Since the hardware implementation of the ANN models are
not considered, therefore, only the disturbances to the input parameters were
considered.
Different ANN models, developed in the course of this research work, were
considered for the sensitivity analysis. The sensitivity of the selected networks to the
fluctuations of the APS input processing parameters were considered. Uniform
distributed noise, generated by MATLAB’s Simulink tool, was used in this study to
simulate the effect of input parameter disturbances. With the gradual addition of the
noise to the input, the networks were simulated each time and the correlation
coefficient values (R) demonstrated the changes in the networks performances.
For all the considered networks, the values of correlation coefficient were
reduced with the gradual addition of noise. The networks output became more
scattered with the increase of input data noise. However, there were variations in which
the networks responded. Some of the networks were less responsive to the noise
addition while some networks were sensitive to small percentages of input
disturbances. Such analysis would assist thermal spray engineers in selecting
appropriate artificial neural network models for the specific on-line plasma spray control
process on the basis of individual system requirements.
The MLP ANN structure, trained with a Bayesian regularization algorithm and
an expanded training set, was the most sensitive to the fluctuations of input
parameters. The modular network, a single hidden layer structure trained with the
original database and a Levenberg-Marquardt algorithm, was the least sensitive to any
changes in the input parameters.
In the course of the work, an experiment was set up in relation to the APS
process. The input processing parameters were varied and the changes in the dynamic
Chapter 7: Conclusion and Future Work
Tanveer Ahmed Choudhury Page 237
behaviour of the in-flight particle characteristics were observed using a dichromatic
sensor. The processing parameters and corresponding in-flight particle characteristic
values were processed to form the experimental database. The database was then
used to train selected ANN structures and models from previous simulation works from
the open literature. In spite of the differences in the experimental database, in
comparison to the one from the literature, the developed networks were found to
successfully model the APS process. The networks handled the process dynamics and
learned the underlying input / output relationships. This was demonstrated as the
models predicted the in-flight particle characteristics from the input processing
parameters with good generalization ability.
The good generalization performance of the network models on the experiment
database presents a validation of the proposed models and structures. The different
models were previously found, in their respective chapters, to work well with the
database from the literature. It can, thus, be said that the applicability of the developed
ANN models are not limited to a single case. The networks can be re-trained and
optimized to be used for a range of different cases and environment. The ANN-based
models, within limits of its training data and the input processing parameters
considered, thus, can be incorporated to an on-line plasma spray control system to
allow the automated system achieve the desired process stability.
7.2 Future work
The current research work on the application of artificial neural networks in a
plasma spray manufacturing process has provided some fruitful results. It also presents
some potential future paths that can be explored further.
The current study considered sensitivity of the developed artificial neural
network models with the different levels of fluctuations of all the input processing
parameters at the same time. The effects of variations of the individual input
processing parameters are also of great importance. Work can be performed in
observing how the networks behave with the fluctuations of a single input parameter at
a time. This would help in outlining the networks understanding of the relationships of
the output in-flight particle characteristics with each of the input processing parameters.
The effect of noise on the network parameters would need to be studied further.
Chapter 7: Conclusion and Future Work
Tanveer Ahmed Choudhury Page 238
The ELM algorithm in the study used the expanded training dataset. Further
work needs to be carried out to view the applicability of the ELM algorithm on a smaller
training dataset. The study would be helpful in expanding applications to different
areas, where a large database might not be available or cannot be created using any
statistical techniques.
The Levenberg-Marquardt, Bayesian regularization, resilient back-propagation
and ELM algorithms used in this study applied batch learning to the network training
process, where the weigh update is performed after presentation of the entire training
samples in an epoch. In the pattern mode, the weight update is performed after
presentation of each of the training sample (input / output pattern) and, thus, calls for
lower local storage requirements. The analysis of the performance of ANNs trained
with a back propagation algorithm based on pattern mode learning is an exciting field
for further research.
References
Tanveer Ahmed Choudhury Page 240
References:
[1] E. Pfender, "Fundamental studies associated with the plasma spray process,"
Surface and Coatings Technology, vol. 34, pp. 1-14, 1988.
[2] P. Fauchais and M. Vardelle, "Plasma spraying - present and future," Pure and
Applied Chemistry, vol. 66, pp. 1247-1258, 1994.
[3] P. Fauchais, "Understanding plasma spraying," Journal of Physics D: Applied
Physics, vol. 37, pp. R86-R108, 2004.
[4] C. C. Berndt and T. F. Bernecki, Thermal spray coatings: ASM International,
2003.
[5] H. Herman and S. Sampath, "Thermal spray coatings," in Metallurgical and
Ceramic Protective Coatings, ed: Springer, 1996, pp. 261-289.
[6] J. Davis, Handbook of thermal spray technology: Materials Park, OH : ASM
International 2005.
[7] G. E. Kim and J. Walker, "Successful application of nanostructured titanium
dioxide coating for high-pressure acid-leach application," Journal of Thermal
Spray Technology, vol. 16, pp. 34-39, 2007.
[8] C. Moreau, P. Gougeon, M. Lamontagne, V. Lacasse, G. Vaudreuil, and P.
Cielo, "On-line control of the plasma spraying process by monitoring the
temperature, velocity, and trajectory of in-flight particles," in 7th National
Thermal Spray Conference, C. C. Berndt and S. Sampath, Eds., Thermal Spray
Industrial Applications, Boston, MA, 1994, pp. 431-437.
[9] P. Nylén, J. Wigren, J. Idetjärn, L. Pejryd, M. Friis, and P. Moretto, "On-line
microstructure and property control of a thermal sprayed abrasive coating," in
Proceedings of International Thermal Spray Conference (ITSC), C. Berndt, K.
Khor, and E. Lugscheider, Eds., Thermal Spray 2001: New Surfaces for a New
Millennium (ASM International), Singapore, 2001, pp. 1213-1220.
[10] M. Friis and C. Persson, "Process window for plasma processes," in
Proceedings of International Thermal Spray Conference (ITSC), C. Berndt, K.
Khor, and E. Lugscheider, Eds., Thermal Spray 2001: New Surfaces for a New
Millennium (ASM International), Singapore, 2001, pp. 1313-1319.
References
Tanveer Ahmed Choudhury Page 241
[11] J. Guilemany, J. Nin, and J. Delgado, "On-line-monitoring control of stainless
steel coatings obtained by APS processes," in Proceedings of International
Thermal Spray Conference (ITSC), E. Lugscheider and P. A. Kammer, Eds.,
Thermal Spray 2002: International Thermal Spray Conference (DVS-ASM),
DVS-Verlag, Düsseldorf, Germany, 2002, pp. 86-90.
[12] J. Cizek and K. A. Khor, "Role of in-flight temperature and velocity of powder
particles on plasma sprayed hydroxyapatite coating characteristics," Surface
and Coatings Technology, vol. 206, pp. 2181-2191, 2012.
[13] A. F. Kanta, G. Montavon, M. P. Planche, and C. Coddet, "Artificial neural
networks implementation in plasma spray process: prediction of power
parameters and in-flight particle characteristics vs. desired coating structural
attributes," Surface and Coatings Technology, vol. 203, pp. 3361-3369, 2009.
[14] S. Guessasma, G. Montavon, and C. Coddet, "Modeling of the APS plasma
spray process using artificial neural networks: basis, requirements and an
example," Computational Materials Science, vol. 29, pp. 315-333, 2004.
[15] M. Vardelle and P. Fauchais, "Plasma spray processes: diagnostics and
control?," Pure and Applied Chemistry, vol. 71, pp. 1909-1918, 1999.
[16] J.-E. Döring, R. Vassen, and D. Stöve, "The influence of spray parameters on
particle properties," in Proceedings of International Thermal Spray Conference
(ITSC), E. Lugscheider and P. A. Kammer, Eds., Thermal Spray 2002:
International Thermal Spray Conference (DVS-ASM), DVS-Verlag, Düsseldorf,
Germany, 2002, pp. 440-445.
[17] E. Lugscheider and N. Papenfuß-Janzen, "Simulation of the influence of spray
parameters on particle properties in APS," in Proceedings of International
Thermal Spray Conference (ITSC), E. Lugscheider and P. A. Kammer, Eds.,
Thermal Spray 2002: International Thermal Spray Conference (DVS-ASM),
DVS-Verlag, Düsseldorf, Germany, 2002, pp. 42-46.
[18] A.Refke, G.Barbezat, and M.Loch, "The benefit of an on-line diagnostic system
for the optimisation of plasma spray devices and parameters," in Proceedings of
International Thermal Spray Conference (ITSC), C. Berndt, K. Khor, and E.
Lugscheider, Eds., Thermal Spray 2001: New Surfaces for a New Millennium
(ASM International), Singapore, 2001, pp. 765-770.
References
Tanveer Ahmed Choudhury Page 242
[19] C. Moreau, "Towards a better control of thermal spray processes," in
Proceedings of International Thermal Spray Conference (ITSC), C. Coddet, Ed.,
Thermal Spray 1998: Meeting the Challenges of the 21st Century (ASM
International), Nice, France, 1998, pp. 1681-1693.
[20] J. F. Bisson, B. Gauthier, and C. Moreau, "Effect of plasma fluctuations on in-
flight particle parameters," Journal of Thermal Spray Technology, vol. 12, pp.
38-43, 2003.
[21] C. J. Einerson, D. E. Clark, B. A. Detering, and P. L. Taylor, "Intelligent control
strategies for the plasma spray process," Thermal Spray Coatings: Research,
Design and Applications, Proceedings of the Sixth NTSC, June 1993, Anaheim,
ASM International, Materials Park, OH, USA, pp. 205-211, 1993.
[22] P. L. Bartlett, "For valid generalization the size of the weights is more important
than the size," in Advances in Neural Information Processing Systems 9, M. C.
Mozer, M. I. Jordan, and T. Petsche, Ed., The MIT press, 1997, pp. 134-140.
[23] T. Elsken, "Even on finite test sets smaller nets may perform better," Neural
Networks, vol. 10, pp. 369-385, 1997.
[24] P. Fauchais and M. Vardelle, "How to improve the reliability and reproducibility
of plasma sprayed coatings," in Proceedings of International Thermal Spray
Conference (ITSC), B. R. Marple and C. Moreau, Eds., Thermal Spray 2003:
Advancing the Science and Applying the Technology (ASM International),
Materials Park, OH, 2003, pp. 1165-1173.
[25] P. Fauchais, M. Vardelle, and A. Vardelle, "Reliability of plasma-sprayed
coatings: monitoring the plasma spray process and improving the quality of
coatings," Journal of Physics D: Applied Physics, vol. 46, 2013.
[26] A. F. Kanta, G. Montavon, and C. Coddet, "Predicting spray processing
parameters from required coating structural attributes by artificial intelligence,"
Advanced Engineering Materials, vol. 8, pp. 628-635, 2006.
[27] S. Guessasma, G. Montavon, and C. Coddet, "Plasma spray process modelling
using artificial neural networks: application to Al2O3-TiO2 (13% by weight)
ceramic coating structure," in 2nd International Conference on Thermal Process
Modelling and Computer Simulation (ICTPMCS), Journal De Physique. IV : JP,
Nancy; France, 2004, pp. 363-370.
References
Tanveer Ahmed Choudhury Page 243
[28] S. Guessasma and C. Coddet, "Neural computation applied to APS spray
process: porosity analysis," Surface and Coatings Technology, vol. 197, pp. 85-
92, 2005.
[29] S. Guessasma and C. Coddet, "Microstructure of APS alumina-titania coatings
analysed using artificial neural network," Acta Materialia, vol. 52, pp. 5157-
5164, 2004.
[30] S. Guessasma, G. Montavon, and C. Coddet, "Analysis of the influence of
atmospheric plasma spray (APS) parameters on adhesion properties of
alumina-titania coatings," Journal of Adhesion Science and Technology, vol. 18,
pp. 495-505, 2004.
[31] S. Guessasma, D. Hao, L. Moulla, H. L. Liao, and C. Comet, "Neural
computation to estimate heat flux in an atmospheric plasma spray process,"
Heat Transfer Engineering, vol. 26, pp. 65-72, 2005.
[32] M. D. Jean, B. T. Lin, and J. H. Chou, "Application of an artificial neural network
for simulating robust plasma-sprayed zirconia coatings," Journal of the
American Ceramic Society, vol. 91, pp. 1539-1547, 2008.
[33] L. Wang, J. Fang, Z. Zhao, and H. Zeng, "Application of backward propagation
network for forecasting hardness and porosity of coatings by plasma spraying,"
Surface and Coatings Technology, vol. 201, pp. 5085-5089, 2007.
[34] P. Fauchais and M. Vardelle, "Sensors in spray processes," Journal of Thermal
Spray Technology, vol. 19, pp. 668-694, 2010.
[35] S. Datta, D. K. Pratihar, and P. P. Bandyopadhyay, "Modeling of input-output
relationships for a plasma spray coating process using soft computing tools,"
Applied Soft Computing Journal, vol. 12, pp. 3356-3368, 2012.
[36] W. Xia, H. Zhang, G. Wang, and Y. Yang, "Intelligent process modeling of
robotic plasma spraying based on multi-layer artificial neural network," Hanjie
Xuebao/Transactions of the China Welding Institution, vol. 30, pp. 41-44, 2009.
[37] E. Lugscheider and K. Seemann, "Prediction of plasma sprayed coating
properties by the use of neural networks," in Proceedings of International
Thermal Spray Conference (ITSC), Thermal Spray 2004: Advances in
Technology and Applications (ASM International), Osaka, Japan, 2004, pp. 459-
463.
References
Tanveer Ahmed Choudhury Page 244
[38] S. Guessasma, G. Montavon, and C. Coddet, "Neural computation to predict in-
flight particle characteristic dependences from processing parameters in the
APS process," Journal of Thermal Spray Technology, vol. 13, pp. 570-585,
2004.
[39] S. Guessasma, Z. Salhi, G. Montavon, P. Gougeon, and C. Coddet, "Artificial
intelligence implementation in the APS process diagnostic," Materials Science
and Engineering B: Solid-State Materials for Advanced Technology, vol. 110,
pp. 285-295, 2004.
[40] S. Guessasma, G. Montavon, P. Gougeon, and C. Coddet, "Designing expert
system using neural computation in view of the control of plasma spray
processes," Materials & Design, vol. 24, pp. 497-502, 2003.
[41] D. W. Patterson, Artificial Neural Networks: theory and applications: Prentice
Hall, 1996.
[42] S. E. Fahlman, "Faster-learning variations on back propagation: an emperical
study," Proceedings of the 1988 Connectionist Models Summer School, pp. 38-
51, 1988.
[43] S. Guessasma, G. Montavon, and C. Coddet, "On the neural network concept
to describe the thermal spray deposition process: an introduction," in
Proceedings of International Thermal Spray Conference (ITSC), E. Lugscheider
and P. A. Kammer, Eds., Thermal Spray 2002: International Thermal Spray
Conference (DVS-ASM), DVS-Verlag, Düsseldorf, Germany, 2002, pp. 435-
439.
[44] S. Guessasma, G. Montavon, P. Gougeon, and C. Coddet, "On the neural
network concept to describe the thermal spray deposition process: correlation
between in-flight particles characteristics and processing parameters," in
Proceedings of International Thermal Spray Conference (ITSC), E. Lugscheider
and P. A. Kammer, Eds., Thermal Spray 2002: International Thermal Spray
Conference (DVS-ASM), DVS-Verlag, Düsseldorf, Germany, 2002, pp. 483-
488.
[45] A. F. Kanta, G. Montavon, M. Vardelle, M. P. Planche, C. C. Berndt, and C.
Coddet, "Artificial neural networks vs. fuzzy logic: simple tools to predict and
References
Tanveer Ahmed Choudhury Page 245
control complex processes-application to plasma spray processes," Journal of
Thermal Spray Technology, vol. 17, pp. 365-376, 2008.
[46] A. F. Kanta, G. Montavon, M. P. Planche, and C. Coddet, "Artificial intelligence
computation to establish relationships between APS process parameters and
alumina-titania coating properties," Plasma Chemistry and Plasma Processing,
vol. 28, pp. 249-262, 2008.
[47] A. F. Kanta, G. Montavon, C. C. Berndt, M. P. Planche, and C. Coddet,
"Intelligent system for prediction and control: application in plasma spray
process," Expert Systems with Applications, vol. 38, pp. 260-271, 2011.
[48] A. F. Kanta, M. P. Planche, G. Montavon, and C. Coddet, "In-flight and upon
impact particle characteristics modelling in plasma spray process," Surface and
Coatings Technology, vol. 204, pp. 1542-1548, 2010.
[49] C. C. Berndt, "Preparation of thermal spray powders," Education module on
thermal spray, Pub. ASM International, Ohio, 1992.
[50] S. K. Chidrawar, S. Bhaskarwar, and B. M. Patre, "Implementation of neural
network for generalized predictive control: a comparison between a Newton
Raphson and Levenberg Marquardt implementation," in Computer Science and
Information Engineering, 2009 WRI World Congress on, 2009, pp. 669-673.
[51] S. Guessasma, G. Montavon, and C. Coddet, "On the implementation of neural
network concept to optimize thermal spray deposition process," in
Combinatorial and Artificial Intelligence Methods in Materials Science vol. 700,
I. Takeuchi, J. M. Newsam, L. T. Wille, H. Koinuma, and E. J. Amis, Eds., ed.
Warrendale: Materials Research Society, 2002, pp. 253-258.
[52] E. Pfender, "Plasma jet behavior and modeling associated with the plasma
spray process," Thin Solid Films, vol. 238, pp. 228-241, 1994.
[53] K. Alamara, S. Saber Samandari, and C. C. Berndt, "Splat taxonomy of
polymeric thermal spray coating," Surface and Coatings Technology, vol. 205,
pp. 5028-5034, 2011.
[54] P. Fauchais, G. Montavon, and G. Bertrand, "From powders to thermally
sprayed coatings," Journal of Thermal Spray Technology, vol. 19, pp. 56-80,
2010.
References
Tanveer Ahmed Choudhury Page 246
[55] G. Mauer, R. Vassen, S. Zimmermann, T. Biermordt, M. Heinrich, J. L.
Marques, K. Landes, and J. Schein, "Investigation and comparison of in-flight
particle velocity during the plasma-spray process as measured by laser doppler
anemometry and DPV-2000," Journal of Thermal Spray Technology, vol. 22, pp.
892-900, 2013.
[56] P. Gougeon, C. Moreau, and F. Richard, "On-line control of plasma sprayed
particles in the aerospace industry," Advances in thermal spray science and
technology, pp. 149-155, 1995.
[57] J. Blain, F. Nadeau, L. Pouliot, C. Moreau, P. Gougeon, and L. Leblanc,
"Integrated infrared sensor system for on line monitoring of thermally sprayed
particles," Surface Engineering, vol. 13, pp. 420-424, 1997.
[58] A. Vaidya, G. Bancke, S. Sampath, and H. Herman, "Influence of process
variables on the plasma-sprayed coatings: an integrated study," in Proceedings
of International Thermal Spray Conference (ITSC), C. Berndt, K. Khor, and E.
Lugscheider, Eds., Thermal Spray 2001: New Surfaces for a New Millennium
(ASM International), Singapore, 2001, pp. 1345-1349.
[59] S. Chen, P. Sitonen, and P. Kettunen, "Experimental design and parameter
optimization for plasma spraying of alumina coatings," in Proceedings of
International Thermal Spray Conference (ITSC), C. Berndt, Ed., Thermal Spray
1992: International Advances in Coatings Technology Materials Park (OH),
1992, pp. 51-56.
[60] X. Lin, Y. Zeng, S. W. Lee, and C. Ding, "Characterization of alumina–3 wt.%
titania coating prepared by plasma spraying of nanostructured powders,"
Journal of the European Ceramic Society, vol. 24, pp. 627-634, 2004.
[61] S. M. Forghani, M. J. Ghazali, A. Muchtar, A. R. Daud, N. H. N. Yusoff, and C.
H. Azhari, "Effects of plasma spray parameters on TiO2-coated mild steel using
design of experiment (DoE) approach," Ceramics International, vol. 39, pp.
3121-3127, 2013.
[62] I. Fisher, "Variables influencing the characteristics of plasma-sprayed coatings,"
International Metallurgical Reviews, vol. 17, pp. 117-129, 1972.
[63] I. Aleksander and H. Morton, An introduction to neural computing, 2nd ed.:
London : International Thomson Computer Press, 1995.
References
Tanveer Ahmed Choudhury Page 247
[64] S. Haykin, Neural networks: a comprehensive foundation: Prentice Hall PTR
Upper Saddle River, NJ, USA, 1994.
[65] M. Nelson and W. Illingworth, "A practical guide to neural nets," ed. United
States, 1991.
[66] H. Simon, "Adaptive filter theory," Prentice Hall, vol. 2, pp. 478-481, 2002.
[67] B. Widrow and S. D. Stearns, Adaptive signal processing vol. 15: IET, 1985.
[68] G. Bolt, "Fault tolerance in artificial neural networks," Advanced Computer
Architecture Group, Department of Computer Science, University of York,
Heslington, York, YO1 5DD, U.K.1992.
[69] D. E. Rumelhart, "Brain style computation: learning and generalization," in An
introduction to neural and electronic networks, ed: Academic Press
Professional, Inc., 1990, pp. 405-420.
[70] P. P. San, S. H. Ling, and H. T. Nguyen, "Industrial application of evolvable
block-based neural network to hypoglycemia monitoring system," IEEE
Transactions on Industrial Electronics, vol. 60, pp. 5892-5901, 2013.
[71] K. C. Hsieh, Y. J. Chen, H. K. Lu, L. C. Lee, Y. C. Huang, and Y. Y. Chen, "The
novel application of artificial neural network on bioelectrical impedance analysis
to assess the body composition in elderly," Nutrition Journal, vol. 12, 2013.
[72] N. Gueli, A. Martinez, W. Verrusio, A. Linguanti, P. Passador, V. Martinelli, G.
Longo, B. Marigliano, F. Cacciafesta, and M. Cacciafesta, "Empirical antibiotic
therapy (ABT) of lower respiratory tract infections (LRTI) in the elderly:
application of artificial neural network (ANN) preliminary results," Archives of
Gerontology and Geriatrics, vol. 55, pp. 499-503, 2012.
[73] T. J. Pollard, L. Harra, D. Williams, S. Harris, D. Martinez, and K. Fong, "2012
PhysioNet challenge: an artificial neural network to predict mortality in ICU
patients and application of solar physics analysis methods," in 39th Computing
in Cardiology Conference, CinC, Krakow; Poland, 2012, pp. 485-488.
[74] J. Shu, Z. Zhang, I. Gonzalez, and R. Karoumi, "The application of a damage
detection method using artificial neural network and train-induced vibrations on
a simplified railway bridge model," Engineering Structures, vol. 52, pp. 408-421,
2013.
References
Tanveer Ahmed Choudhury Page 248
[75] S. P. Sotiroudis, S. K. Goudos, K. A. Gotsis, K. Siakavara, and J. N. Sahalos,
"Application of a composite differential evolution algorithm in optimal neural
network design for propagation path-loss prediction in mobile communication
systems," IEEE Antennas and Wireless Propagation Letters, vol. 12, pp. 364-
367, 2013.
[76] W. Yonggang, C. Tianyou, F. Jun, S. Jing, and W. Hong, "Adaptive decoupling
switching control of the forced-circulation evaporation system using neural
networks," IEEE Transactions on Control Systems Technology, vol. 21, pp. 964-
974, 2013.
[77] L. Che-Wei, Y. T. C. Yang, W. Jeen-Shing, and Y. Yi-Ching, "A wearable sensor
module with a neural network based activity classification algorithm for daily
energy expenditure estimation," IEEE Transactions on Information Technology
in Biomedicine, vol. 16, pp. 991-998, 2012.
[78] G. Sideratos and N. D. Hatziargyriou, "Probabilistic wind power forecasting
using radial basis function neural networks," IEEE Transactions on Power
Systems, vol. 27, pp. 1788-1796, 2012.
[79] C. Opathella, B. Singh, D. Cheng, and B. Venkatesh, "Intelligent wind generator
models for power flow studies in PSS E and PSS SINCAL," IEEE Transactions
on Power Systems, vol. 28, pp. 1149-1159, 2013.
[80] A. N. Al Masri, M. Z. A. Ab Kadir, H. Hizam, and N. Mariun, "A novel
implementation for generator rotor angle stability prediction using an adaptive
artificial neural network application for dynamic security assessment," IEEE
Transactions on Power Systems vol. 28, pp. 2516-2525, 2013.
[81] G. Che, P. B. Luh, L. D. Michel, W. Yuting, and P. B. Friedland, "Very short-
term load forecasting: wavelet neural networks with data pre-filtering," IEEE
Transactions on Power Systems, vol. 28, pp. 30-41, 2013.
[82] Z. Minghu, D. Wang, L. Shijun, Q. Enzhong, C. Shaojie, and L. Yingsong,
"Research on the wavelet neural network pattern recognition technology for
chemical agents," in International Conference of Information Science and
Management Engineering (ISME), 2010, pp. 241-244.
References
Tanveer Ahmed Choudhury Page 249
[83] G. Rigoll, "Mutual information neural networks for dynamic pattern recognition
tasks," in Proceedings of the IEEE International Symposium on Industrial
Electronics (ISIE), 1996, pp. 80-85 vol.1.
[84] A. A. D. M. Meneses, A. P. De Almeida, J. Soares, P. Azambuja, M. S.
Gonzalez, S. Cardoso, D. Braz, C. E. De Almeida, and R. C. Barroso,
"Segmentation of X-ray micro-computed tomography using neural networks
trained with statistical information: application to biomedical images," in IEEE
Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC),
IEEE, Valencia 2011, pp. 3999-4001.
[85] A. Y. M. Ontman and G. J. Shiflet, "Application of artificial neural networks for
feature recognition in image registration," Journal of Microscopy, vol. 246, pp.
20-32, 2012.
[86] S. Ray, N. D. Prionas, K. K. Lindfors, and J. M. Boone, "Analysis of breast CT
lesions using computer-aided diagnosis: an application of neural networks on
extracted morphologic and texture features," in Progress in Biomedical Optics
and Imaging - Proceedings of SPIE, San Diego, CA; United States, 2012.
[87] G. An, "The effects of adding noise during backpropagation training on a
generalization performance," Neural Computation, vol. 8, pp. 643-674, 1996.
[88] K. Matsuoka, "Noise injection into inputs in back-propagation learning," IEEE
Transactions on Systems, Man and Cybernetics, vol. 22, pp. 436-440, 1992.
[89] J. Yulei, R. M. Zur, L. L. Pesce, and K. Drukker, "A study of the effect of noise
injection on the training of artificial neural networks," in International Joint
Conference on Neural Networks (IJCNN), 2009, pp. 1428-1432.
[90] G. N. Karystinos and D. A. Pados, "On overfitting, generalization, and randomly
expanded training sets," IEEE Transactions on Neural Networks, vol. 11, pp.
1050-1057, 2000.
[91] W. Kai, Y. Jufeng, S. Guangshun, and W. Qingren, "An expanded training set
based validation method to avoid overfitting for neural network classifier," in
Fourth International Conference on Natural Computation (ICNC), 2008, pp. 83-
87.
References
Tanveer Ahmed Choudhury Page 250
[92] R. Kohavi, "A study of cross-validation and bootstrap for accuracy estimation
and model selection," in International Joint Conference on Artificial Intelligence
(IJCAI), 1995, pp. 1137-1145.
[93] Y. Liu, "Create stable neural networks by cross-validation," in International Joint
Conference on Neural Networks (IJCNN), IEEE, 2006, pp. 3925-3928.
[94] L. Prechelt, "Automatic early stopping using cross validation: quantifying the
criteria," Neural Networks, vol. 11, pp. 761-767, 1998.
[95] H. Wu and J. L. Shapiro, "Parameter cross-validation and early-stopping in
univariate marginal distribution algorithm," in Proceedings of the 9th annual
conference on Genetic and evolutionary computation, ACM, 2007, pp. 632-633.
[96] F. Burden and D. Winkler, "Bayesian regularization of neural networks,"
Methods Mol Biol, vol. 458, pp. 25-44, 2008.
[97] P. S. Churchland and T. J. Sejnowski, The computational brain: The MIT press,
1992.
[98] R. S. Sutton, "Temporal credit assignment in reinforcement learning," 8410337
Ph.D., University of Massachusetts Amherst, Ann Arbor, 1984.
[99] S. Becker, "Unsupervised learning procedures for neural networks,"
International Journal of Neural Systems, vol. 2, pp. 17-33, 1991.
[100] M. T. Hagan and M. B. Mehnaj, "Training feedforward networks with the
Marquardt algorithm," IEEE Transactions on Neural Networks, vol. 5, pp. 989-
993, November 1994.
[101] D. J. C. Mackay, "Bayesian interpolation," in Maximum Entropy and Bayesian
Methods. vol. 50, C. R. Smith, G. J. Erickson, and P. O. Neudorfer, Eds., ed
Dordrecht: Kluwer Academic Publ, 1992, pp. 39-66.
[102] F. Dan Foresee and M. T. Hagan, "Gauss-Newton approximation to Bayesian
learning," in International Conference on Neural Networks, 1997, pp. 1930-
1935.
[103] D. Nguyen and B. Widrow, "Improving the learning speed of 2-layer neural
networks by choosing initial values of the adaptive weights," in Proceedings of
the international joint conference on neural networks, Washington, 1990, pp.
21-26.
References
Tanveer Ahmed Choudhury Page 251
[104] M. Riedmiller and H. Braun, "A direct adaptive method for faster
backpropagation learning: the RPROP algorithm," in IEEE International
Conference on Neural Networks, 1993, pp. 586-591 vol.1.
[105] A. J. Sharkey, Combining artificial neural nets: ensemble and modular multi-net
systems: Springer-Verlag New York, Inc., 1999.
[106] J. C. S. Amanda, "On combining artificial neural nets," Connection Science, vol.
8, pp. 299-314, 1996.
[107] L. K. Hansen and P. Salamon, "Neural network ensembles," IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 12, pp. 993-1001, 1990.
[108] J. A. Fodor, The modularity of mind: A Bradford Book, MIT press, London,
England, 1983.
[109] J. M. Bates and C. W. Granger, "The combination of forecasts," Operations
Research Quaterly, pp. 451-468, 1969.
[110] A. Avizienis and J. P. J. Kelly, "Fault tolerance by design diversity: concepts and
experiments," IEEE: Computer, vol. 17, pp. 67-80, 1984.
[111] A. Krogh and J. Vedelsby, "Neural network ensembles, cross validation, and
active learning," Advances in neural information processing systems, pp. 231-
238, 1995.
[112] L. Breiman, "Bagging predictors," Machine Learning, vol. 24, pp. 123-140, 1996.
[113] A. J. Sharkey, N. E. Sharkey, and G. Chandroth, "Neural nets and diversity," in
Proceedings of the 14th International Conference on Computer Safety,
Reliability and Security, 1995, pp. 375-389.
[114] K. Tumer and J. Ghosh, "Error correlation and error reduction in ensemble
classifiers," Connection Science, vol. 8, pp. 385-404, 1996.
[115] H. Drucker, C. Cortes, L. D. Jackel, Y. LeCun, and V. Vapnik, "Boosting and
other ensemble methods," Neural Computation, vol. 6, pp. 1289-1301, 1994.
[116] Y. Freund and R. E. Schapire, "Experiments with a new boosting algorithm," in
Machine learning: Proceedings of the thirteenth international conference, 1996,
pp. 148-156.
[117] A. J. Sharkey and N. E. Sharkey, "Combining diverse neural nets," Knowledge
Engineering Review, vol. 12, pp. 231-247, 1997.
References
Tanveer Ahmed Choudhury Page 252
[118] R. A. Jacobs, "Methods for combining experts' probability assessments," Neural
Computation, vol. 7, pp. 867-888, 1995.
[119] C. Genest and J. V. Zidek, "Combining probability distributions: a critique and
an annotated bibliography," Statistical Science, vol. 1, pp. 114-135, 1986.
[120] L. Xu, A. Krzyzak, and C. Y. Suen, "Methods of combining multiple classifiers
and their applications to handwriting recognition," IEEE Transactions on
Systems, Man and Cybernetics, vol. 22, pp. 418-435, 1992.
[121] P. A. Zhilkin and R. L. Somorjai, "Application of several methods of
classification fusion to magnetic resonance spectra," Connection Science, vol.
8, pp. 427-442, 1996.
[122] M. P. Perrone and L. N. Cooper, "When networks disagree: ensemble methods
for hybrid neural networks," DTIC Document1992.
[123] S. Hashem, "Effects of collinearity on combining neural networks," Connection
Science, vol. 8, pp. 315-336, 1996.
[124] S. Hashem, B. Schmeiser, and Y. Yih, "Optimal linear combinations of neural
networks: an overview," in IEEE World Congress on Computational Intelligence,
1994, pp. 1507-1512.
[125] G. Rogova, "Combining the results of several neural network classifiers," Neural
Networks, vol. 7, pp. 777-781, 1994.
[126] K. A. Al-Ghoneim and B. V. Kumar, "Learning ranks with neural networks," in
SPIE's 1995 Symposium on OE/Aerospace Sensing and Dual Use Photonics,
International Society for Optics and Photonics, 1995, pp. 446-464.
[127] K. Tumer and J. Ghosh, "Order statistics combiners for neural classifiers," in
Proceedings of the World Congress on Neural Networks, 1995, pp. 31-34.
[128] D. H. Wolpert, "Stacked generalization," Neural Networks, vol. 5, pp. 241-259,
1992.
[129] M. LeBlanc and R. Tibshirani, "Combining estimates in regression and
classification," Journal of the American statistical Association, vol. 91, pp. 1641-
1650, 1996.
References
Tanveer Ahmed Choudhury Page 253
[130] N. E. Sharkey and A. J. Sharkey, "Artificial neural networks for coordination and
control: the portability of experiential representations," Robotics and
Autonomous Systems, vol. 22, pp. 345-359, 1997.
[131] N. E. Sharkey and A. J. Sharkey, "A modular design for connectionist parsing "
in Proceedings of Workshop on Language Technology, M. F. J. Drosaers and
A. Nijholt, Eds., 1992, pp. 87-96.
[132] T. Catfolis and K. Meert, "Hybridization and specialization of real-time recurrent
learning-based neural networks," Connection Science, vol. 9, pp. 51-70, 1997.
[133] Y. Bennani and P. Gallinari, "Task decomposition through a modular
connectionist architecture: a talker identification system," in Proceeding 3rd
International Conference on Artificial Neural Networks, Amsterdam, The
Netherlands: North-Holland, 1992, pp. 783-786.
[134] P. Gallinari, "Training of modular neural net systems," The Handbook of Brain
Theory and Neural Networks, pp. 582-585, 1995.
[135] L. Y. Pratt, J. Mostow, and C. A. Kamm, "Direct transfer of learned information
among neural networks," in Proceedings of the Ninth National Conference on
Artificial Intelligence, Anaheim, CA: AAAI Press, 1991, pp. 584-589.
[136] B. L. Lu and M. Ito, "Task decomposition and module combination based on
class relations: a modular neural network for pattern classification," IEEE
Transactions on Neural Networks, vol. 10, pp. 1244-1256, 1999.
[137] J. B. Hampshire, II and A. Waibel, "The Meta-Pi network: building distributed
knowledge representations for robust multisource pattern recognition," IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 14, pp. 751-
769, 1992.
[138] A. Waibel, H. Sawai, and K. Shikano, "Modularity and scaling in large phonemic
neural networks," IEEE Transactions on Acoustics, Speech and Signal
Processing, vol. 37, pp. 1888-1898, 1989.
[139] W. G. Baxt, "Improving the accuracy of an artificial neural network using
multiple differently trained networks," Neural Computation, vol. 4, pp. 772-780,
1992.
References
Tanveer Ahmed Choudhury Page 254
[140] R. Anand, K. Mehrotra, C. K. Mohan, and S. Ranka, "Efficient classification for
multiclass problems using modular neural networks," IEEE Transactions on
Neural Networks, vol. 6, pp. 117-124, 1995.
[141] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, "Adaptive mixtures
of local experts," Neural Computation, vol. 3, pp. 79-87, 1991.
[142] M. I. Jordan and R. A. Jacobs, "Hierarchical mixtures of experts and the EM
algorithm," Neural Computation, vol. 6, pp. 181-214, 1994.
[143] F. Peng, R. A. Jacobs, and M. A. Tanner, "Bayesian inference in mixtures-of-
experts and hierarchical mixtures-of-experts models with an application to
speech recognition," Journal of the American statistical Association, pp. 953-
960, 1996.
[144] A. J. Sharkey, N. E. Sharkey, and S. S. Cross, "Adapting an ensemble
approach for the diagnosis of breast cancer," in International Conference on
Artificial Neural Networks (ICANN), 1998, pp. 281-286.
[145] C. Mccormack, "Adaptation of learning rule parameters using a meta neural
network," Connection Science, vol. 9, pp. 123-136, 1997.
[146] K. Kim and E. B. Bartlett, "Error estimation by series association for neural
network systems," Neural Computation, vol. 7, pp. 799-808, 1995.
[147] P. Koistinen and L. Holmstrom, "Kernel regression and backpropagation training
with noise," in IEEE International Joint Conference on Neural Networks, IEEE,
1991, pp. 367-372.
[148] E. Parzen, "On estimation of a probability density function and mode," Annals of
Mathematical Statistics, vol. 33, pp. 1065-&, 1962.
[149] M. Rosenblatt, "Remarks on some nonparametric estimates of a density
function," Annals of Mathematical Statistics, vol. 27, pp. 832-837, 1956.
[150] Cacoullo.T, "Estimation of a multivariate density," Annals of the Institute of
Statistical Mathematics, vol. 18, pp. 179-&, 1966.
[151] Z. Dongling, T. Yingjie, and Z. Peng, "Kernel-based nonparametric regression
method," in IEEE/WIC/ACM International Conference on Web Intelligence and
Intelligent Agent Technology, 2008, pp. 410-413.
References
Tanveer Ahmed Choudhury Page 255
[152] C. Charalambous, "Conjugate gradient algorithm for efficient training of artificial
neural networks," IEEE Proceedings on Circuits, Devices and Systems, vol.
139, pp. 301-310, 1992.
[153] E. Barnard, "Optimization for training neural nets," IEEE Transactions on Neural
Networks, vol. 3, pp. 232-240, 1992.
[154] D. F. Shanno, Recent advances in numerical techniques for large-scale
optimization: MIT Press, Cambridge, MA, 1990.
[155] S. Kollias and D. Anastassiou, "An adaptive least squares algorithm for the
efficient training of artificial neural networks," IEEE Transactions on Circuits and
Systems, vol. 36, pp. 1092-1101, 1989.
[156] D. Marquardt, "An algorithm for least-squares estimation of nonlinear
parameters," Journal of the Society for Industrial and Applied Mathematics, vol.
11, pp. 431-441, 1963.
[157] A. J. Adeloye and A. De Munari, "Artificial neural network based generalized
storage-yield-reliability models using the Levenberg-Marquardt algorithm,"
Journal of Hydrology, vol. 326, pp. 215-230, 2006.
[158] B. Pateyron, M.-F. Elchinger, G. Delluc, and P. Fauchais, "Thermodynamic and
transport properties of Ar-H2 and Ar-He plasma gases used for spraying at
atmospheric pressure. I: Properties of the mixtures," Plasma Chemistry and
Plasma Processing, vol. 12, pp. 421-448, 1992.
[159] M. I. Boulos, P. Fauchais, A. Vardelle, and E. Pfender, "Fundamentals of
plasma particle momentum and heat transfer," Plasma Spraying: Theory and
Applications, pp. 3–60, 1993.
[160] M. Friis, C. Persson, and J. Wigren, "Influence of particle in-flight characteristics
on the microstructure of atmospheric plasma sprayed yttria stabilized ZrO2,"
Surface and Coatings Technology, vol. 141, pp. 115-127, 2001.
[161] C. Bossoutrot, F. Braillard, T. Renault, M. Vardelle, and P. Fauchais,
"Preliminary studies of a closed-loop for a feedback control of air plasma spray
processes," in Proceedings of International Thermal Spray Conference (ITSC),
E. Lugscheider and P. A. Kammer, Eds., Thermal Spray 2002: International
Thermal Spray Conference (DVS-ASM), DVS-Verlag, Düsseldorf, Germany,
2002, pp. 56-61.
References
Tanveer Ahmed Choudhury Page 256
[162] A. F. Kanta, G. Montavon, M. P. Planche, and C.Coddet, "Prospect for plasma
spray process on-line control via artificial intelligence (neural networks and
fuzzy logic)," in Proceedings of International Thermal Spray Conference (ITSC),
Thermal Spray 2006: Science, Innovation, and Application (ASM International),
2006, pp. 1027 - 1033.
[163] T. A. Choudhury, N. Hosseinzadeh, and C. C. Berndt, "Improving the
generalization ability of an artificial neural network in predicting in-flight particle
characteristics of an atmospheric plasma spray process," Journal of Thermal
Spray Technology, vol. 21, pp. 935-949, 2012.
[164] T. A. Choudhury, N. Hosseinzadeh, and C. C. Berndt, "Artificial neural network
application for predicting in-flight particle characteristics of an atmospheric
plasma spray process," Surface and Coatings Technology, vol. 205, pp. 4886-
4895, 2011.
[165] H. Guang-Bin, "Learning capability and storage capacity of two-hidden-layer
feedforward networks," IEEE Transactions on Neural Networks, vol. 14, pp.
274-281, 2003.
[166] S. Tamura and M. Tateishi, "Capabilities of a four-layered feedforward neural
network: four layers versus three," IEEE Transactions on Neural Networks, vol.
8, pp. 251-255, 1997.
[167] H. Guang-Bin, Z. Qin-Yu, and S. Chee-Kheong, "Real-time learning capability of
neural networks," IEEE Transactions on Neural Networks, vol. 17, pp. 863-878,
2006.
[168] H. Guang-Bin, C. Lei, and S. Chee-Kheong, "Universal approximation using
incremental constructive feedforward networks with random hidden nodes,"
IEEE Transactions on Neural Networks, vol. 17, pp. 879-892, 2006.
[169] K. Hornik, "Approximation capabilities of multilayer feedforward networks,"
Neural Networks, vol. 4, pp. 251-257, 1991.
[170] H. Guang-Bin, Z. Qin-Yu, and S. Chee-Kheong, "Extreme learning machine: a
new learning scheme of feedforward neural networks," in IEEE International
Joint Conference on Neural Networks, 2004, pp. 985-990 vol.2.
[171] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, "Extreme learning machine: theory
and applications," Neurocomputing, vol. 70, pp. 489-501, 2006.
References
Tanveer Ahmed Choudhury Page 257
[172] H. Guang-Bin, Z. Hongming, D. Xiaojian, and Z. Rui, "Extreme learning machine
for regression and multiclass classification," IEEE Transactions on Systems,
Man, and Cybernetics, Part B: Cybernetics, vol. 42, pp. 513-529, 2012.
[173] S. Samet and A. Miri, "Privacy-preserving back-propagation and extreme
learning machine algorithms," Data & Knowledge Engineering, vol. 79–80,
pp. 40-61, 2012.
[174] P. L. Bartlett, "The sample complexity of pattern classification with neural
networks: the size of the weights is more important than the size of the
network," IEEE Transactions on Information Theory, vol. 44, pp. 525-536, 1998.
[175] G. B. Huang and H. A. Babri, "Upper bounds on the number of hidden neurons
in feedforward networks with arbitrary bounded nonlinear activation functions,"
IEEE Transactions on Neural Networks, vol. 9, pp. 224-229, 1998.
[176] C. R. Rao and S. K. Mitra, "Generalized inverse of a matrix and its
applications," in Proceedings of the Sixth Berkeley Symposium on Mathematical
Statistics and Probability Volume 1: Theory of Statistics, 1971, pp. 601-620.
[177] D. Serre, Matrices: theory and applications vol. 216: Springer, 2010.
Appendix A: List of Publications
Tanveer Ahmed Choudhury Page 259
Appendix A: List of Publications
Referred International Journals:
[1] T. A. Choudhury, C. C. Berndt, and Z. Man, "An Extreme Learning Machine
Algorithm to Predict the In-flight Particle Characteristics of an Atmospheric Plasma
Spray Process," Plasma Chemistry and Plasma Processing, vol. 33, pp. 993-1023,
2013.
[2] T. A. Choudhury, N. Hosseinzadeh, and C. C. Berndt, "Improving the
generalization ability of an artificial neural network in predicting in-flight particle
characteristics of an atmospheric plasma spray process," Journal of Thermal Spray
Technology, vol. 21, pp. 935-949, 2012.
[3] T. A. Choudhury, N. Hosseinzadeh, and C. C. Berndt, "Artificial neural network
application for predicting in-flight particle characteristics of an atmospheric plasma
spray process," Surface and Coatings Technology, vol. 205, pp. 4886-4895, 2011.
Peer Reviewed Conference Proceedings:
[4] T. A. Choudhury, N. Hosseinzadeh, and C. C. Berndt, "Using artificial neural
network to predict the particle characteristics of an atmospheric plasma spray process,"
in International Conference on Electrical and Computer Engineering (ICECE), Dhaka,
2010, pp. 726-729.
Appendix B: Expanded Database
Tanveer Ahmed Choudhury Page 260
Appendix B: Expanded Database, DSE
Count I [A]
ArV[SLPM]
2HV[SLPM]
CGV[SLPM]
injD [mm]
ID[mm]
Experimental Values
V [m/s]
T [°C]
D [μm]
1 303 40 14 3.2 6 1.8 232 2,212 40
2 308 40 14 3.2 6 1.8 233 2,218 40
3 314 40 14 3.2 6 1.8 234 2,224 41
4 319 40 14 3.2 6 1.8 235 2,229 41
5 324 40 14 3.2 6 1.8 237 2,235 41
6 330 40 14 3.2 6 1.8 238 2,240 42
7 335 40 14 3.2 6 1.8 239 2,246 42
8 341 40 14 3.2 6 1.8 240 2,251 42
9 346 40 14 3.2 6 1.8 241 2,257 43
10 351 40 14 3.2 6 1.8 242 2,262 43
11 357 40 14 3.2 6 1.8 243 2,267 43
12 362 40 14 3.2 6 1.8 244 2,273 44
13 367 40 14 3.2 6 1.8 245 2,278 44
14 373 40 14 3.2 6 1.8 246 2,283 44
15 378 40 14 3.2 6 1.8 247 2,288 45
16 384 40 14 3.2 6 1.8 248 2,293 45
17 389 40 14 3.2 6 1.8 249 2,298 45
18 394 40 14 3.2 6 1.8 250 2,303 46
19 400 40 14 3.2 6 1.8 251 2,308 46
20 405 40 14 3.2 6 1.8 252 2,312 46
21 410 40 14 3.2 6 1.8 253 2,317 46
22 416 40 14 3.2 6 1.8 254 2,322 47
23 421 40 14 3.2 6 1.8 255 2,326 47
24 427 40 14 3.2 6 1.8 256 2,331 47
25 432 40 14 3.2 6 1.8 257 2,335 47
26 437 40 14 3.2 6 1.8 258 2,339 48
27 443 40 14 3.2 6 1.8 258 2,343 48
28 448 40 14 3.2 6 1.8 259 2,348 48
Appendix B: Expanded Database
Tanveer Ahmed Choudhury Page 261
Appendix B: Expanded Database, DSE (Continued – Count 29 to 56)
Count I [A]
ArV[SLPM]
2HV[SLPM]
CGV[SLPM]
injD [mm]
ID[mm]
Experimental Values
V [m/s]
T [°C]
D [μm]
29 453 40 14 3.2 6 1.8 260 2,352 48
30 459 40 14 3.2 6 1.8 261 2,356 49
31 464 40 14 3.2 6 1.8 262 2,359 49
32 469 40 14 3.2 6 1.8 263 2,363 49
33 475 40 14 3.2 6 1.8 263 2,367 49
34 480 40 14 3.2 6 1.8 264 2,371 50
35 486 40 14 3.2 6 1.8 265 2,374 50
36 491 40 14 3.2 6 1.8 265 2,378 50
37 496 40 14 3.2 6 1.8 266 2,381 50
38 502 40 14 3.2 6 1.8 267 2,384 50
39 507 40 14 3.2 6 1.8 268 2,387 50
40 512 40 14 3.2 6 1.8 268 2,390 51
41 518 40 14 3.2 6 1.8 269 2,393 51
42 523 40 14 3.2 6 1.8 269 2,396 51
43 529 40 14 3.2 6 1.8 270 2,399 51
44 534 40 14 3.2 6 1.8 271 2,402 51
45 539 40 14 3.2 6 1.8 271 2,404 51
46 545 40 14 3.2 6 1.8 272 2,407 51
47 550 40 14 3.2 6 1.8 272 2,409 51
48 555 40 14 3.2 6 1.8 273 2,411 52
49 561 40 14 3.2 6 1.8 273 2,414 52
50 566 40 14 3.2 6 1.8 274 2,416 52
51 572 40 14 3.2 6 1.8 274 2,418 52
52 577 40 14 3.2 6 1.8 274 2,420 52
53 582 40 14 3.2 6 1.8 275 2,421 52
54 588 40 14 3.2 6 1.8 275 2,423 52
55 593 40 14 3.2 6 1.8 276 2,425 52
56 598 40 14 3.2 6 1.8 276 2,426 52
Appendix B: Expanded Database
Tanveer Ahmed Choudhury Page 262
Appendix B: Expanded Database, DSE (Continued – Count 57 to 84)
Count I [A]
ArV[SLPM]
2HV[SLPM]
CGV[SLPM]
injD [mm]
ID[mm]
Experimental Values
V [m/s]
T [°C]
D [μm]
57 604 40 14 3.2 6 1.8 276 2,427 52
58 609 40 14 3.2 6 1.8 277 2,429 52
59 614 40 14 3.2 6 1.8 277 2,430 52
60 620 40 14 3.2 6 1.8 277 2,431 52
61 625 40 14 3.2 6 1.8 277 2,432 52
62 631 40 14 3.2 6 1.8 278 2,433 52
63 636 40 14 3.2 6 1.8 278 2,434 52
64 641 40 14 3.2 6 1.8 278 2,434 52
65 647 40 14 3.2 6 1.8 278 2,435 52
66 652 40 14 3.2 6 1.8 278 2,435 52
67 657 40 14 3.2 6 1.8 278 2,436 52
68 663 40 14 3.2 6 1.8 279 2,436 52
69 668 40 14 3.2 6 1.8 279 2,436 52
70 674 40 14 3.2 6 1.8 279 2,436 52
71 679 40 14 3.2 6 1.8 279 2,436 52
72 684 40 14 3.2 6 1.8 279 2,436 52
73 690 40 14 3.2 6 1.8 279 2,436 51
74 695 40 14 3.2 6 1.8 279 2,436 51
75 700 40 14 3.2 6 1.8 279 2,435 51
76 706 40 14 3.2 6 1.8 279 2,435 51
77 711 40 14 3.2 6 1.8 279 2,434 51
78 716 40 14 3.2 6 1.8 279 2,434 51
79 722 40 14 3.2 6 1.8 279 2,433 51
80 727 40 14 3.2 6 1.8 279 2,432 51
81 733 40 14 3.2 6 1.8 278 2,431 50
82 738 40 14 3.2 6 1.8 278 2,430 50
83 743 40 14 3.2 6 1.8 278 2,429 50
84 749 40 14 3.2 6 1.8 278 2,428 50
Appendix B: Expanded Database
Tanveer Ahmed Choudhury Page 263
Appendix B: Expanded Database, DSE (Continued – Count 85 to 112)
Count I [A]
ArV[SLPM]
2HV[SLPM]
CGV[SLPM]
injD [mm]
ID[mm]
Experimental Values
V [m/s]
T [°C]
D [μm]
85 754 40 14 3.2 6 1.8 278 2,427 50
86 759 40 14 3.2 6 1.8 278 2,425 50
87 765 40 14 3.2 6 1.8 277 2,424 49
88 770 40 14 3.2 6 1.8 277 2,423 49
89 776 40 14 3.2 6 1.8 277 2,421 49
90 781 40 14 3.2 6 1.8 277 2,420 49
91 786 40 14 3.2 6 1.8 276 2,418 49
92 792 40 14 3.2 6 1.8 276 2,416 49
93 797 40 14 3.2 6 1.8 276 2,414 48
94 802 40 14 3.2 6 1.8 276 2,412 48
95 808 40 14 3.2 6 1.8 275 2,411 48
96 813 40 14 3.2 6 1.8 275 2,409 48
97 819 40 14 3.2 6 1.8 275 2,407 47
98 824 40 14 3.2 6 1.8 274 2,404 47
99 829 40 14 3.2 6 1.8 274 2,402 47
100 835 40 14 3.2 6 1.8 274 2,400 47
101 840 40 14 3.2 6 1.8 273 2,398 47
102 530 40 0.0 3.2 6 1.8 205 1,675 30
103 530 40 0.2 3.2 6 1.8 207 1,703 30
104 530 40 0.3 3.2 6 1.8 209 1,730 31
105 530 40 0.5 3.2 6 1.8 210 1,756 31
106 530 40 0.7 3.2 6 1.8 212 1,782 31
107 530 40 0.9 3.2 6 1.8 214 1,807 32
108 530 40 1.0 3.2 6 1.8 215 1,832 32
109 530 40 1.2 3.2 6 1.8 217 1,856 32
110 530 40 1.4 3.2 6 1.8 219 1,879 33
111 530 40 1.5 3.2 6 1.8 220 1,901 33
112 530 40 1.7 3.2 6 1.8 222 1,923 33
Appendix B: Expanded Database
Tanveer Ahmed Choudhury Page 264
Appendix B: Expanded Database, DSE (Continued: Count 113 to 140)
Count I [A]
ArV[SLPM]
2HV[SLPM]
CGV[SLPM]
injD [mm]
ID[mm]
Experimental Values
V [m/s]
T [°C]
D [μm]
113 530 40 1.9 3.2 6 1.8 223 1,945 34
114 530 40 2.0 3.2 6 1.8 225 1,965 34
115 530 40 2.2 3.2 6 1.8 226 1,985 34
116 530 40 2.4 3.2 6 1.8 228 2,005 35
117 530 40 2.6 3.2 6 1.8 229 2,024 35
118 530 40 2.7 3.2 6 1.8 231 2,042 35
119 530 40 2.9 3.2 6 1.8 232 2,060 36
120 530 40 3.1 3.2 6 1.8 234 2,077 36
121 530 40 3.2 3.2 6 1.8 235 2,094 36
122 530 40 3.4 3.2 6 1.8 236 2,110 37
123 530 40 3.6 3.2 6 1.8 237 2,125 37
124 530 40 3.7 3.2 6 1.8 239 2,140 37
125 530 40 3.9 3.2 6 1.8 240 2,154 38
126 530 40 4.1 3.2 6 1.8 241 2,168 38
127 530 40 4.3 3.2 6 1.8 242 2,182 38
128 530 40 4.4 3.2 6 1.8 243 2,194 39
129 530 40 4.6 3.2 6 1.8 244 2,206 39
130 530 40 4.8 3.2 6 1.8 245 2,218 39
131 530 40 4.9 3.2 6 1.8 246 2,229 40
132 530 40 5.1 3.2 6 1.8 247 2,240 40
133 530 40 5.3 3.2 6 1.8 248 2,250 40
134 530 40 5.4 3.2 6 1.8 249 2,260 41
135 530 40 5.6 3.2 6 1.8 250 2,269 41
136 530 40 5.8 3.2 6 1.8 251 2,278 41
137 530 40 6.0 3.2 6 1.8 252 2,286 42
138 530 40 6.1 3.2 6 1.8 253 2,294 42
139 530 40 6.3 3.2 6 1.8 254 2,302 42
140 530 40 6.5 3.2 6 1.8 254 2,309 42
Appendix B: Expanded Database
Tanveer Ahmed Choudhury Page 265
Appendix B: Expanded Database, DSE (Continued: Count 141 to 168)
Count I [A]
ArV[SLPM]
2HV[SLPM]
CGV[SLPM]
injD [mm]
ID[mm]
Experimental Values
V [m/s]
T [°C]
D [μm]
141 530 40 6.6 3.2 6 1.8 255 2,315 43
142 530 40 6.8 3.2 6 1.8 256 2,321 43
143 530 40 7.0 3.2 6 1.8 257 2,327 43
144 530 40 7.1 3.2 6 1.8 257 2,332 44
145 530 40 7.3 3.2 6 1.8 258 2,337 44
146 530 40 7.5 3.2 6 1.8 258 2,342 44
147 530 40 7.7 3.2 6 1.8 259 2,346 44
148 530 40 7.8 3.2 6 1.8 259 2,350 45
149 530 40 8.0 3.2 6 1.8 260 2,353 45
150 530 40 8.2 3.2 6 1.8 260 2,357 45
151 530 40 8.3 3.2 6 1.8 261 2,359 45
152 530 40 8.5 3.2 6 1.8 261 2,362 45
153 530 40 8.7 3.2 6 1.8 262 2,364 46
154 530 40 8.8 3.2 6 1.8 262 2,366 46
155 530 40 9.0 3.2 6 1.8 262 2,368 46
156 530 40 9.2 3.2 6 1.8 263 2,369 46
157 530 40 9.4 3.2 6 1.8 263 2,370 46
158 530 40 9.5 3.2 6 1.8 263 2,371 47
159 530 40 9.7 3.2 6 1.8 264 2,371 47
160 530 40 9.9 3.2 6 1.8 264 2,372 47
161 530 40 10.0 3.2 6 1.8 264 2,372 47
162 530 40 10.2 3.2 6 1.8 264 2,372 47
163 530 40 10.4 3.2 6 1.8 264 2,371 47
164 530 40 10.5 3.2 6 1.8 264 2,371 47
165 530 40 10.7 3.2 6 1.8 265 2,370 48
166 530 40 10.9 3.2 6 1.8 265 2,369 48
167 530 40 11.1 3.2 6 1.8 265 2,368 48
168 530 40 11.2 3.2 6 1.8 265 2,367 48
Appendix B: Expanded Database
Tanveer Ahmed Choudhury Page 266
Appendix B: Expanded Database, DSE (Continued: Count 169 to 196)
Count I [A]
ArV[SLPM]
2HV[SLPM]
CGV[SLPM]
injD [mm]
ID[mm]
Experimental Values
V [m/s]
T [°C]
D [μm]
169 530 40 11.4 3.2 6 1.8 265 2,365 48
170 530 40 11.6 3.2 6 1.8 265 2,363 48
171 530 40 11.7 3.2 6 1.8 265 2,362 48
172 530 40 11.9 3.2 6 1.8 264 2,360 48
173 530 40 12.1 3.2 6 1.8 264 2,358 48
174 530 40 12.2 3.2 6 1.8 264 2,356 48
175 530 40 12.4 3.2 6 1.8 264 2,354 48
176 530 40 12.6 3.2 6 1.8 264 2,351 48
177 530 40 12.8 3.2 6 1.8 264 2,349 48
178 530 40 12.9 3.2 6 1.8 264 2,347 48
179 530 40 13.1 3.2 6 1.8 263 2,344 48
180 530 40 13.3 3.2 6 1.8 263 2,342 48
181 530 40 13.4 3.2 6 1.8 263 2,339 48
182 530 40 13.6 3.2 6 1.8 262 2,336 48
183 530 40 13.8 3.2 6 1.8 262 2,334 48
184 530 40 13.9 3.2 6 1.8 262 2,331 48
185 530 40 14.1 3.2 6 1.8 261 2,328 48
186 530 40 14.3 3.2 6 1.8 261 2,326 48
187 530 40 14.5 3.2 6 1.8 261 2,323 48
188 530 40 14.6 3.2 6 1.8 260 2,320 48
189 530 40 14.8 3.2 6 1.8 260 2,318 47
190 530 40 15.0 3.2 6 1.8 259 2,315 47
191 530 40 15.1 3.2 6 1.8 259 2,312 47
192 530 40 15.3 3.2 6 1.8 259 2,310 47
193 530 40 15.5 3.2 6 1.8 258 2,307 47
194 530 40 15.6 3.2 6 1.8 258 2,305 47
195 530 40 15.8 3.2 6 1.8 257 2,302 47
196 530 40 16.0 3.2 6 1.8 256 2,300 46
Appendix B: Expanded Database
Tanveer Ahmed Choudhury Page 267
Appendix B: Expanded Database, DSE (Continued: Count 197 to 224)
Count I [A]
ArV[SLPM]
2HV[SLPM]
CGV[SLPM]
injD [mm]
ID[mm]
Experimental Values
V [m/s]
T [°C]
D [μm]
197 530 40 16.2 3.2 6 1.8 256 2,298 46
198 530 40 16.3 3.2 6 1.8 255 2,296 46
199 530 40 16.5 3.2 6 1.8 255 2,294 46
200 530 40 16.7 3.2 6 1.8 254 2,292 45
201 530 40 16.8 3.2 6 1.8 254 2,290 45
202 530 40 17.0 3.2 6 1.8 253 2,288 45
203 530 45 15 3.2 6 1.8 176 2,403 51
204 530 22.5 7.5 3.2 6 1.8 179 2,456 49
205 530 37.5 12.5 3.2 6 1.8 263 2,393 50
206 530 40 14 2.0 6 1.8 250 2,345 48
207 530 40 14 2.0 6 1.8 250 2,346 48
208 530 40 14 2.1 6 1.8 251 2,347 48
209 530 40 14 2.1 6 1.8 251 2,348 48
210 530 40 14 2.1 6 1.8 251 2,349 48
211 530 40 14 2.2 6 1.8 251 2,350 48
212 530 40 14 2.2 6 1.8 252 2,351 48
213 530 40 14 2.2 6 1.8 252 2,352 48
214 530 40 14 2.2 6 1.8 252 2,353 48
215 530 40 14 2.3 6 1.8 253 2,354 48
216 530 40 14 2.3 6 1.8 253 2,355 48
217 530 40 14 2.3 6 1.8 253 2,356 48
218 530 40 14 2.4 6 1.8 253 2,357 48
219 530 40 14 2.4 6 1.8 254 2,358 48
220 530 40 14 2.4 6 1.8 254 2,359 49
221 530 40 14 2.5 6 1.8 254 2,360 49
222 530 40 14 2.5 6 1.8 255 2,362 49
223 530 40 14 2.5 6 1.8 255 2,363 49
224 530 40 14 2.5 6 1.8 255 2,364 49
Appendix B: Expanded Database
Tanveer Ahmed Choudhury Page 268
Appendix B: Expanded Database, DSE (Continued: Count 225 to 252)
Count I [A]
ArV[SLPM]
2HV[SLPM]
CGV[SLPM]
injD [mm]
ID[mm]
Experimental Values
V [m/s]
T [°C]
D [μm]
225 530 40 14 2.6 6 1.8 256 2,365 49
226 530 40 14 2.6 6 1.8 256 2,366 49
227 530 40 14 2.6 6 1.8 256 2,367 49
228 530 40 14 2.7 6 1.8 257 2,369 49
229 530 40 14 2.7 6 1.8 257 2,370 49
230 530 40 14 2.7 6 1.8 257 2,371 49
231 530 40 14 2.8 6 1.8 258 2,372 49
232 530 40 14 2.8 6 1.8 258 2,373 49
233 530 40 14 2.8 6 1.8 258 2,375 50
234 530 40 14 2.8 6 1.8 259 2,376 50
235 530 40 14 2.9 6 1.8 259 2,377 50
236 530 40 14 2.9 6 1.8 260 2,378 50
237 530 40 14 2.9 6 1.8 260 2,380 50
238 530 40 14 3.0 6 1.8 260 2,381 50
239 530 40 14 3.0 6 1.8 261 2,382 50
240 530 40 14 3.0 6 1.8 261 2,384 50
241 530 40 14 3.1 6 1.8 261 2,385 50
242 530 40 14 3.1 6 1.8 262 2,386 50
243 530 40 14 3.1 6 1.8 262 2,388 50
244 530 40 14 3.1 6 1.8 262 2,389 51
245 530 40 14 3.2 6 1.8 263 2,390 51
246 530 40 14 3.2 6 1.8 263 2,391 51
247 530 40 14 3.2 6 1.8 264 2,393 51
248 530 40 14 3.3 6 1.8 264 2,394 51
249 530 40 14 3.3 6 1.8 264 2,395 51
250 530 40 14 3.3 6 1.8 265 2,397 51
251 530 40 14 3.4 6 1.8 265 2,398 51
252 530 40 14 3.4 6 1.8 265 2,399 51
Appendix B: Expanded Database
Tanveer Ahmed Choudhury Page 269
Appendix B: Expanded Database, DSE (Continued: Count 253 to 280)
Count I [A]
ArV[SLPM]
2HV[SLPM]
CGV[SLPM]
injD [mm]
ID[mm]
Experimental Values
V [m/s]
T [°C]
D [μm]
253 530 40 14 3.4 6 1.8 266 2,401 51
254 530 40 14 3.4 6 1.8 266 2,402 51
255 530 40 14 3.5 6 1.8 267 2,403 51
256 530 40 14 3.5 6 1.8 267 2,405 52
257 530 40 14 3.5 6 1.8 267 2,406 52
258 530 40 14 3.6 6 1.8 268 2,407 52
259 530 40 14 3.6 6 1.8 268 2,408 52
260 530 40 14 3.6 6 1.8 268 2,410 52
261 530 40 14 3.7 6 1.8 269 2,411 52
262 530 40 14 3.7 6 1.8 269 2,412 52
263 530 40 14 3.7 6 1.8 270 2,414 52
264 530 40 14 3.7 6 1.8 270 2,415 52
265 530 40 14 3.8 6 1.8 270 2,416 52
266 530 40 14 3.8 6 1.8 271 2,417 52
267 530 40 14 3.8 6 1.8 271 2,419 53
268 530 40 14 3.9 6 1.8 271 2,420 53
269 530 40 14 3.9 6 1.8 272 2,421 53
270 530 40 14 3.9 6 1.8 272 2,422 53
271 530 40 14 4.0 6 1.8 272 2,423 53
272 530 40 14 4.0 6 1.8 273 2,425 53
273 530 40 14 4.0 6 1.8 273 2,426 53
274 530 40 14 4.0 6 1.8 273 2,427 53
275 530 40 14 4.1 6 1.8 274 2,428 53
276 530 40 14 4.1 6 1.8 274 2,429 53
277 530 40 14 4.1 6 1.8 274 2,430 53
278 530 40 14 4.2 6 1.8 275 2,432 53
279 530 40 14 4.2 6 1.8 275 2,433 53
280 530 40 14 4.2 6 1.8 275 2,434 54
Appendix B: Expanded Database
Tanveer Ahmed Choudhury Page 270
Appendix B: Expanded Database, DSE (Continued: Count 281 to 308)
Count I [A]
ArV[SLPM]
2HV[SLPM]
CGV[SLPM]
injD [mm]
ID[mm]
Experimental Values
V [m/s]
T [°C]
D [μm]
281 530 40 14 4.3 6 1.8 276 2,435 54
282 530 40 14 4.3 6 1.8 276 2,436 54
283 530 40 14 4.3 6 1.8 276 2,437 54
284 530 40 14 4.3 6 1.8 276 2,438 54
285 530 40 14 4.4 6 1.8 277 2,439 54
286 530 40 14 4.4 6 1.8 277 2,440 54
287 530 40 14 4.4 6 1.8 277 2,441 54
288 530 40 14 4.5 6 1.8 278 2,442 54
289 530 40 14 4.5 6 1.8 278 2,443 54
290 530 40 14 4.5 6 1.8 278 2,444 54
291 530 40 14 4.6 6 1.8 278 2,445 54
292 530 40 14 4.6 6 1.8 279 2,446 54
293 530 40 14 4.6 6 1.8 279 2,447 54
294 530 40 14 4.6 6 1.8 279 2,448 55
295 530 40 14 4.7 6 1.8 279 2,448 55
296 530 40 14 4.7 6 1.8 280 2,449 55
297 530 40 14 4.7 6 1.8 280 2,450 55
298 530 40 14 4.8 6 1.8 280 2,451 55
299 530 40 14 4.8 6 1.8 280 2,452 55
300 530 40 14 4.8 6 1.8 281 2,453 55
301 530 40 14 4.9 6 1.8 281 2,453 55
302 530 40 14 4.9 6 1.8 281 2,454 55
303 530 40 14 4.9 6 1.8 281 2,455 55
304 530 40 14 4.9 6 1.8 281 2,456 55
305 530 40 14 5.0 6 1.8 282 2,456 55
306 530 40 14 5.0 6 1.8 282 2,457 55
307 530 40 14 3.2 7 1.8 270 2,434 47
308 530 40 14 3.2 8 1.8 278 2,451 52
Appendix B: Expanded Database
Tanveer Ahmed Choudhury Page 271
Appendix B: Expanded Database, DSE (Continued: Count 309 to 310)
Count I [A]
ArV[SLPM]
2HV[SLPM]
CGV[SLPM]
injD [mm]
ID[mm]
Experimental Values
V [m/s]
T [°C]
D [μm]
309 530 40 14 3.2 6 1.5 265 2,498 54
310 530 40 14 3.2 6 2.0 278 2,363 43
I Current Intensity
ArV Argon primary plasma gas flow rate
2HV Hydrogen secondary plasma gas flow rate
CGV Argon carrier gas flow rate
injD Injector stand-off distance
ID Injector diameter
V Average in-flight particle velocity
T Average in-flight particle temperature
D Average in-flight particle diameter
* The variations of the input processing parameters are presented as bold numbers
Recommended