Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
76
CHAPTER 4
ANALYSIS OF BILATERAL INTELLIGENCE
LEARNING METHOD
4.1 INTRODUCTION TO LEARNING ALGORITHM
Generally in the engineering field of study models have been
employed to understand various interactions and these are classified as
mathematical models and implementation model. The mathematical
model provides a problem definition and problem description in
combinatorial mathematical problems. In a network like dynamic
environment the mathematical model has some limitations and towards
this to the prediction model is proposed.
These prediction models selecting from biological neural
networks are made up of real biological neurons that are physically
connected or functionally-related in the human nervous system and
especially in the human brain. ANN on the other hand, is made up of
artificial neurons interconnected with one another to form a
programming structure that mimics the behaviour and neural processing,
organisation and learning of biological neurons.
The human brain can perform tasks much faster than the
fastest existing computer, thanks to its special ability in massive parallel
data processing. ANN tries to mimic such a remarkable behaviour for
solving narrowly defined problems, i.e. problems with an associative or
cognitive tinge. To this effect, ANN have been extensively and
77
successfully applied to the pattern (speech/image) recognition, time-
series prediction and modelling, function approximation, classification,
adaptive control and other areas.
Neural networks are made of several processing units called
neurons. Three types of neurons are distinguished: input neurons which
receive data from outside the ANN and are organised in the so called
input layer, output neurons which send data out of the ANN and
generally comprise the output layer, and hidden neurons whose input and
output signals remain within the ANN and form the so called hidden
layer (or layers).
Neurons are communicating with each other by sending
signals over a large number of weighted connections, thus creating a
network with a high degree of interconnection. The neurons are trained
using input–output data sets presented to the network. After the training
process, the network produces appropriate outcomes when tested with
similar data sets, in other words, recognizes the introduced patterns. In
this study, neural networks were preferred not only for their ease of
application but also the yield comparable and even better results than
other methods listed above.
A neural network has to be configured such that the
application of a set of inputs produces the desired set of outputs. Various
methods to set the strengths of the connections exist. One way is to set
the weights explicitly, using a priori knowledge. Another way is to 'train'
the neural network by feeding its teaching patterns and letting it change
its weights according to some learning rule.
78
Thus, the learning situations are categorized (Fergus et al,
2010) in two distinct sorts. These are:
• Supervised learning or Associative learning in which the
network is trained by providing it with input and matching
output patterns. These input-output pairs can be provided by
an external resource, or by the system which contains the
network (self-supervised).
• Unsupervised learning or Self-organisation in which an output
unit is trained to respond to clusters of pattern within the
input. In this paradigm the system is supposed to discover
statistically salient features of the input population. Unlike the
supervised learning paradigm, there is no a priori set of
categories into which the patterns are to be classified; rather
the system must develop its own representation of the input
stimuli.
Both learning paradigms discussed above result in an
adjustment of the weights of the connections between units, according to
some modification rule.
The basic idea is that if two units ‘j’ and ‘k’ are active
simultaneously, their interconnection must be strengthened. If j receives
input from k, the simplest version of Hebbian learning prescribes to
modify the weight wjk based on equation (4.1),
∆wjk =γ•yj•yk (4.1)
Where, γ is a positive constant of proportionality representing
the learning rate. Another common rule uses not the actual activation of
79
unit k but the difference between the actual and desired activation for
adjusting the weights based on equation (4.2),
∆wjk =γ•yj (dk- yk) (4.2)
In which, dk is the desired activation. This is often called the
Widrow-Hoff rule or the delta rule.
Suppose there are a set of learning samples consisting of an
input vector x and a desired output d(x). For a classification task the d(x)
is usually +1 or -1. The perception learning rule is very simple and can
be stated as follows:
1. Start with random weights for the connections;
2. Select an input vector x from the set of training samples;
3. If y ≠ d(x) (the perception gives an incorrect response),
modify all connections wi according to the equation (4.3)
∆wi = d(x) • xi (4.3)
4. Go back to 2.
Note that the procedure is very similar to the Hebb rule; the
only difference is that, when the network responds correctly, no
connection weights are modified. Besides modifying the weights, the
system must also modify the thresholdθ.
This θ is considered as a connection w0 between the output
neuron and a 'dummy' predicate unit which is always on: x0 = 1. Given
the perception learning rule as stated above, this threshold is modified
according to equation (4.4),
80
∆θ = �0 � if the perceptron responds correctly1 � d�x�Otherwise � (4.4)
A perception is initialized with the following weights:
w1 = 1; w2 = 2; θ = -2
The perception learning rule is used to learn a correct
discriminant function for a number of samples. The first sample A, with
values x= (0.5,1.5) and target value d(x) = +1 is presented to the
network. It can be calculated that the network output is +1, so no weights
are adjusted. The same is the case for point B, with values x = (-0.5, 0.5)
and target valued(x) = -1; the network output is negative, so no change.
When presenting point C with values x = (0.5; 0.5) the network output
will be -1, while the target value d(x) = ±1.
According to the perception learning rule, the weight changes
are, w1 = 0.5, w2 = 0.5, θ= 1. The new weights are now, w1 = 1.5,
w2 = 2.5, θ= -1, and sample C is classified correctly.
For the perception learning rule there exist a convergence
theorem, which states the following:
Theorem: If there exists a set of connection weights w* which is able to
perform the transformation y = d(x), the perception learning rule will
converge to some solution (which may or may not be the same as w*) in
a finite number of steps for any initial choice of the weights.
An important generalisation of the perception training
algorithm was presented by Widrow and Hoff as the 'least mean square'
learning procedure, also known as the delta rule. The main functional
difference with the perception training rule is the way the output of the
81
system is used in the learning rule. The perception learning rule uses the
output of the threshold function (either -1 or +1) for learning. The delta-
rule uses the net output without further mapping into output values
-1 or +1.
The learning rule was applied to the 'adaptive linear element,'
also named Adaline developed by Widrow and Hoff (Pan et al, 2011). In
a simple physical implementation, this device consists of a set of
controllable resistors connected to a circuit which can sum up currents
caused by the input voltage signals.
Figure 4.1 Working model of the Adaline
Usually the central block, the summer, is also followed by a
quantiser which outputs either +1 of -1, depending on the polarity of the
sum. The functionality of the Adaline learning method is shown in
Figure 4.1.
∑
∑
+1
-1
summer gains input
pattern
switches reference
switch
quantizer
output
error
-1 +1
-
level
+1
-1 +1
w1
w2
w3
w0
+
82
4.2 LEARNING USING ARTIFICIAL NEURAL NETWORK
The evolution of neural network from the human brain has
many desirable characteristics (Chen and Salman, 2011) explored in
computational science which is not available in the traditional von
Neumann architecture or modern parallel computer architecture. The
characteristics are,
• Massive parallelism
• Distributing representation and computation
• Learning ability
• Generalization ability
• Adaptive
• Inherent contextual information processing, and
• Fault tolerance
ANN research has experienced three periods of extensive
activity. The first period was in the 1940s, which started by McCulloch
and Pitts. The second period was in the 1960s, which evolved by
Rosenblatt’s perception convergence theorem, Minsky and Papert’s
review which showing the limitations of a simple perception. This
second revolution attracts many researchers in the field of neural network
for non-stopping 20 years of invention.
The third period of evolution in ANN is from 1980s.
Hopfield’s energy approach, back-propagation learning algorithm,
multilayer perception and continuing research in soft computing is the
well-known examples of the importance of ANN over the periods. Many
Neural network models are designed over the last few decades. The
major algorithm and inventions are marked in Figure 4.2.
83
Figure 4.2 Kinds of Neural networks and its architecture
Neural Networks
Feed-forward
network
Recurrent / feedback network
Single-
layer
perception
Multilayer
perceptron Radial
Basis
Function
nets
Competitive
network
Kohonen's
SOM
Hopfield
network ART
models
84
4.3 LERNING ALGORITHMS
There are a variety of learning algorithms applied to improve
the performance of ANN based classification and prediction models. The
following sub-section explains some of the notable learning algorithms.
4.3.1 Error Back Propagation (EBP)
The popular EBP algorithm is relatively simple and it can
handle problems with basically an unlimited number of patterns. Also,
because of its simplicity, it was relatively easy to adopt the EBP
algorithm for more efficient neural network architectures where
connections across layers are allowed.
However, the EBP algorithm can be up to 1000 times slower
than more advanced second-order algorithms. Many improvements have
been made to speed up the EBP algorithm and some of them, such as
momentum and adaptive learning constantly algorithm, work relatively
well. But as long as first-order algorithms are used, improvements are
not dramatic.
This is an EBP algorithm with traditional forward-backward
computation; for EBP algorithm, it may work a little bit faster than
forward-only computation. Now it is only used for standard MLP
networks. EBP algorithm converges slowly, but it can be used for huge
patterns training.
One may notice in the literature that, for almost all cases, very
simple algorithms, such as least mean square or EBP, are used to train
neural networks. These algorithms converge very slowly in comparison
to second-order methods, which converge significantly faster. One
85
reason why second-order algorithms are seldom used is their complexity
which requires the computation of not only gradients but also Jacobian or
Hessian matrices.
Various methods of neural network training have already been
developed, ranging from the evolutionary computation search through
gradient based methods. The best known method is EBP, but this method
is characterized by very poor convergence. Several improvements for
EBP were developed such as the quick prop algorithm, resilient EBP,
back percolation, and delta-bar-delta, but much better results can be
obtained using second-order methods such as Newton or Levenberg–
Marquardt (LM). In the latter one, not only the gradient but also the
Jacobian matrix must be found.
This above work presents a new neuron-by-neuron (NBN)
method of computing the Jacobian matrix. In the computation Jacobian
matrix can be as simple as the computation of the gradient in the EBP
algorithm. However, more memory is required for the Jacobian. In the
case of a network with the number of training patterns np and the number
of network outputs no, the Jacobian is np × no which is of larger
dimensions than the gradient and therefore requires more memory.
In this sense, the NBN algorithm has the same limitations as
the well-known LM algorithm. For example, in the case of 10 000
patterns and neural networks with 25 weights and 3 outputs, the Jacobian
J will have 30 000 rows and 25 columns, all together having 750 000
elements. However, the matrix inversion must be done only for quasi-
Hessian J × JT of 25 × 25 sizes.
86
In this above work, Back-propagation is one of the simplest
and most general methods for training of multilayer neural networks. The
power of back-propagation is that it enables us to compute an effective
error for each hidden unit, and thus derive a learning rule for the input-
to-hidden weights. Our goal now is to set the interconnection weights
based on the training patterns and the desired outputs. Slow convergence
speed, is Disadvantages of error back-propagation algorithm.
4.3.2 Levenberg–Marquardt algorithm (LM)
This is a LM algorithm with traditional forward-backward
computation; for LM (and NBN) algorithm, the improved forward-only
computation performs faster training than forward-backward
computation for networks with multiple outputs. Now it is also only used
for standard MLP networks. LM (and NBN) algorithm converges much
faster than the EBP algorithm for small and media sized patterns training.
This work presents a new NBN method of computing the
Jacobian matrix. It is shown that the computation of the Jacobian matrix
can be as simple as the computation of the gradient in the EBP
algorithm; however, more memory is required for the Jacobian. In the
case of a network with the number of training patterns ‘np’ and the
number of network outputs no, the Jacobian is np× no which is of larger
dimensions than the gradient and therefore requires more
4.3.3 Neuron By Neuron (NBN)
The neuron by neuron is an implementation, applied for
nonlinear signal processor in the field of digital signal processing which
is proposed by Wilamowski (2008, 2009). In this model, the traditional
back propagation neural network is improved. The NBN is compared
87
with existing EBP. The EBP is the most powerful and popular learning
model but it has few pitfalls, 1) slow processing which requires 100-1000
times more iteration and 2) less accuracy.
NBN is proposed by Wilamowski, et al.,(2008) and it is
redefined by Wilamowski, et al.,(2009). Since the development of
EBP—error back propagation—algorithm for training neural networks,
many attempts were made to improve the learning process. There are
some well-known methods like momentum or variable learning rate and
there are less known methods which significantly accelerate learning
rate. The recently developed NBN (neuron-by-neuron) algorithm is very
efficient for neural network training. Comparing with the well known
Levenberg–Marquardt algorithm.
Neuron by Neuron algorithm which is a modification of
the Levenberg Marquet algorithm for arbitrarily connected neurons
ACN. This is a NBN algorithm with forward-backward computation.
NBN algorithm is developed based on LM algorithm, but it can handle
arbitrarily connected neuron networks, also, the convergence is
improved.
The NBN algorithm has several advantages:
(1) The ability to handle arbitrarily connected neural networks;
(2) Forward-only computation (without back propagation process);
and
(3) Direct computation of quasi-Hessian matrix (no need to compute
and store Jacobian matrix).
88
The row elements of the Jacobian matrix for a given pattern
are being computed in the following three steps
(1) Forward Computation
(2) Backward Computation
(3) Jacobian Element Computation
• Forward Computation : In the forward computation, the
neurons connected to the network inputs are first processed so
that their outputs can be used as inputs to the subsequent
neurons. The neurons are then processed as their input values
become available.
• Backward Computation : The sequence of the backward
computation is opposite to the forward computation sequence.
The process starts with the last neuron and continues toward
the input. The vector δ represents signal propagation from a
network output to the inputs of all other neurons. The size of
this vector is equal to the number of neurons.
• Jacobian Element Computation : After the forward and
backward computation, all the neurons outputs y and vector δ
are calculated. By applying all training patterns, the whole
Jacobian matrix can be calculated and stored.
The NBN algorithm is introduced to solve the structure and
memory limitation in the Levenberg–Marquardt algorithm. Based on the
specially designed NBN routings, the NBN algorithm can be used not
only for traditional MLP networks, but also other arbitrarily connected
neural networks. The NBN algorithm can be organized in two
89
procedures—with back propagation process and without back
propagation process.
The NBN algorithm does not require to store and to multiply
large Jacobian matrix. As a consequence, the memory requirement for
quasi-Hessian matrix and gradient vector computation is decreased by (P ×
M) times, where P is the number of patterns and M is the number of
outputs. An additional benefit of memory reduction is also a significant
reduction in computation time.
Therefore, the training speed of the NBN algorithm becomes
much faster than the traditional Levenberg–Marquardt algorithm. In the
NBN algorithm, quasi-Hessian matrix can be computed on the fly when
training patterns are applied. Moreover, it has the special advantage for
applications which require dynamically changing the number of training
patterns. There is no need to repeat the entire multiplication of JTJ, but
only add to or subtract from quasi-Hessian matrix. The quasi-Hessian
matrix can be modified as patterns are applied or removed.
4.3.4 Forward-only Computation
The NBN procedure introduced in the earlier section, it
requires both forward and backward computation. Especially, one may
notice that for networks with multiple outputs, the back-propagation
process has to be repeated for each output.
Wilamowski,et al., (2010) is proposed an improved NBN
computation is introduced to overcome the problem, by removing
backpropagation process in the computation of the Jacobian matrix. And
also the method introduced to allow for training arbitrarily connected
90
neural networks, therefore, more powerful neural network architectures
with connections across layers can be efficiently trained.
The proposed method also simplifies neural network training,
by using the forward-only computation instead of the traditionally used
forward and backward computation. Information needed for the gradient
vector (for first-order algorithms) and Jacobian or Hessian matrix (for
second-order algorithms) is obtained during forward computation.
With the proposed algorithm, it is now possible to solve the
same problems using a much smaller number of neurons because the
proposed algorithm is able to train more complex neural network
architectures that require a smaller number of neurons. Comparable
results of the computation cost show that the proposed forward-only
computation can be faster than the traditional implementation of the
Levenberg–Marquardt algorithm.
4.3.5 Improved Levenberg–Marquardt Algorithm (ILM)
Wilamowski,et al.,(2010) has proposed the ILM to be
improved computation is aimed to optimize the neural networks learning
process using Levenberg–Marquardt (LM) algorithm. Quasi-Hessian
matrix and gradient vector are computed directly, without Jacobian
matrix multiplication and storage. The memory limitation problem for
LM training is solved. Considering the symmetry of quasi-Hessian
matrix, only elements in its upper/lower triangular array need to be
calculated.
Therefore, training speed is improved significantly, not only
because of the smaller array stored in memory, but also the reduced
operations in quasi-Hessian matrix calculation. The improved memory
91
and time efficiencies are especially true for large sized patterns training.
The improved computation is introduced to increase the training
efficiency of LM algorithm in the above mentioned work.
4.3.6 Two Hidden Layers Artificial Neural Network (2HLANN)
A two hidden layers artificial neural network (2HLANN)
model is proposed by Mkadem, F and Boumaiza, S (2011).It is used for
predicting the dynamic nonlinear characteristics of wideband power
amplifiers. The 2HLANN is an improved model of feed forward neural
network. The 2HLANN is designed in terms of number of neurons,
learning rate and memory space.
4.4 PROPOSED ANALYSIS OF BILATERAL
INTELLIGENCE LEARNING METHOD
Textual pattern mining is one of the major research areas in the
field of data mining. The data mining is anemergingtechnique which
applies many approaches and methods from another field of study and
the data mining is implemented in another area to learn hidden
knowledge. In this proposed work, ANN is used for learning texual
pattern in the Metadata conceptual mining model.
The proposed learning algorithm is called as, analysis of
bilateral intelligence, is used to identify and classify the synonymy of the
sentences. The proposed method provides efficient learning which
identifies patterns which have synonymy and the convergent of the
training algorithm is very fast than existing methodology. From the
results, it is concluded that the performance of proposed ABI is
optimized. Hence, the proposed Metadata conceptual mining model with
ABI learning will provide optimality than existing clustering algorithm.
92
In order to improve the performance of MCMM, a new
learning method is proposed. ANN is used for learning textual pattern in
the Metadata conceptual mining model. In the proposed ANN based
unsupervised learning is called as, Analysis of Bilateral Intelligence. The
proposed learning algorithm is used to identify and classify the
synonymy of the sentences. It applies the learning process to identify two
equivalent terms (Bilateral) which has the same meaning. It contains text
documents as datasets. Improving accuracy of text clustering is the
required output and it is achieved error free clustering is the goal.
This thesis proposes an effective text clustering methodology.
For text clustering MCMM is proposed which is described in section 3.3.
The performances of algorithms and techniques used in computational
field of domain are improved by means of proper learning method.
Hence, in order to improve the performances of proposed MCMM, a
learning method is proposed.
The proposed learning model involves the learning of
conceptual terms from the MCMM. The terms learned from the proposed
learning algorithm are grouped and added to the STL. The frequent
update of conceptual terms in the STL is more important for effective
clustering. For learning of such terminologies, this proposed work
applies Artificial Neural Network based learning algorithm.
4.4.1 Unsupervised Learning Method
This section explains the learning method for text clustering
proposed in the earlier section. There are many learning methods
proposed in the literature for varying engineering applications
(Tenenbaum et al, 2000). ANN is a better classifier than decision tree
93
and Bayesian Classifier, it provides higher accuracy. As the volume of
data set increases, the performance of ANN also will increase. It imitates
the neuron structure of animals, bases on the M-P model and Hebb
learning rule. So, in essence, it is a distributed matrix structure.
Through training, the neural network method gradually
calculates, including repeated iteration or cumulative calculation, the
weights of the neuron connected. So at the end of the training process
neural network will provide error free results. The neural network model
can be broadly divided into the following three types: 1) Feed-forward
(FFNN) neural networks, 2) Back-Propagation (BP) network, 3) Self-
organizing networks. At present, the neural network most commonly
used in data mining is BP network.
ANN is a developing science, and some theories such as the
problems of convergence, stability, local minimum and parameter
adjustment have not really taken shape. For the BP network, frequently
arising problems it encounters are that the training is slow, may fall into
local minimum and it is difficult to determine training parameters. To
solve these problems, some persons adopted the method of combining
artificial neural networks and genetic gene algorithms and achieved
noteworthy results.
In the proposed ANN based Unsupervised learning, training
data which contain text are data sets, improving accuracy of text
clustering is the required output and achieving error free clustering is the
goal. The advantages of the proposed approach are: discriminative
training is straightforward; efficient usage of parameters; local optimum
correlation is explicitly modelled; correlations, even higher order
between different features can be exploited without severe distributional
94
assumptions; highly parallel structures which lead to efficient hardware
implementation.
The architecture of the proposed ANN based unsupervised
learning, training and testing methodologies, the sample data set, and
ratio of training and testing dataset are the important factors for
achieving optimal result in a neural network based learning model. There
isa variation of ANN model available, such as Feed Forward Neural
Network, Back Propagation Neural Network, Hop Field Neural Network,
hybrid neural network, neo-cognition neural network.
The feed forward link artificial neural network is a highly
desirable network model for the researcher due to its simple design, less
hardware cost and relatively high performance (Jasna and Vesna, 2010).
The design of the architecture is more important for the successful
implementation. The artificial neural network has the characteristics of
distributed information storage, parallel processing, information,
reasoning, and self-organized learning, and has the capability of rapid
fitting the non-linear data, so it can solve many problems which are
difficult for other methods to solve.
A major disadvantage of neural networks lies in their
knowledge representation. Acquired knowledge in the form of a network
of units connected by weight links is difficult for humans to interpret.
This factor has motivated research in extracting the knowledge
embedded in training neural networks and in representing that
knowledge symbolically.
95
4.4.2 Analysis of Bilateral Intelligence (ABI)
The proposed ANN based unsupervised learning, is termed as,
Analysis of Bilateral Intelligence (ABI). The ABI applies the learning
process to identify two equivalent terms which have the same meaning.
ABI contains text documents as datasets, improving accuracy of text
clustering which is the required output and achieving error free clustering
in a shorter time is the goal.
The working model of the proposed ABI Learning method is
explained in the following sections:
The sigmoidal function which shown in equation (4.5) is
applied in the proposed ABI,
A x
1X
1 e−=
+ (4.5)
Where, XA is the output in the hidden and output layer.
Where the inputs are ‘x’ which is connected to the hidden layer from
input layer. The connection has weights ‘rai’, between inputs to hidden
layer. And the output of the neurons referred as ‘sba’ is computational
values between output and hidden layer. Where, ‘b’ neurons in the
output layer, ‘a’ neurons in the hidden layer and ‘i’ neurons in the input
layer. The detailed design diagram of neuron model is shown in Figure
4.3.
96
Figure 4.3 Design of Neuron Model
Step 1: Initial Phase
The proposed ABI has implemented from well known initial
phase. In the initial phase, the values of the weights are assigned. Let the
values are ‘R’ and ‘S’. ‘R’ is a value of the hidden layer and input layer.
‘S’ is a value of output layer - hidden layer respectively.
The other constants are penalty constant, which is defined as
µ; and the number of iterations, which is called an epoch, is initialized in
the system. The weight vectors ‘R’ and ‘S’ are to be optimized in order
to minimize the error function.
The generalised delta rule is imposed in the proposed ABI,
which involves two stages of operation. In the first stage of operation,
the input ‘x’ is presented and propagated in a forward direction through
the network is to compute the output values ‘y’ for each output unit. This
output is compared with its desired value ‘do’, resulting in an error signal
97
(the difference between the actual value and the desired value), for each
output unit.
The second stage involves a backward transmission, which
passed through the network after the error was computed. The error
signal is passed to each unit in the network and the appropriate weight
changes are calculated.
Step 2: Weight adjustments Phase
This weight adjustment step is processed based on sigmoid
activation function, shown in the first phase.
The weight of a connection is adjusted by an amount
proportional to the product of an error signal calculated in the second
stage of the first phase.
On the neuron, the unit ‘k’ receiving the input and the output
of the unit ‘j’ is sending this signal along the connection.
Step 3: Optimization of Output Layer Weights
Soptimum = A-1
x B (4.6)
Where
A=∑=
P
p
p
i
p
a ZZ1
a, i =1,…, P (4.7)
B=∑=
P
p
p
b
p
a tZ1
a, b = 1,…, P (4.8)
where, ‘ZP’= scalar output of the hidden neuron of training
data ‘p’, ‘A’ and ‘B’ are output of the hidden layer and output layer
98
respectively, ‘a’ and ‘b’ are neurons in the hidden layer and output layer,
‘i’ is neuron in the input layer, and ‘t’ is transaction function.
The concept of state is fundamental to this description. The
state vector or simply state, denoted by ‘xk’, is defined as the minimal set
of data that is sufficient to uniquely describe the unforced dynamical
behaviour of the system; the subscript ‘k’ denotes discrete time. In other
words, the state is the least amount of data on the past behaviour of the
system that is needed to predict its future behaviour. Typically, the state
‘xk’ is unknown. To estimate it, use a set of observed data, denoted by
the vector ‘yk’.
Step 4: Test for Completion
RMS error (ERMS) was then calculated comparing the ‘Rtest’
matrix with ‘Soptimum’
matrices calculated in Step 3.
a. ERMS< E (4.9)
The hidden layer weight matrix ‘R’ is updated ‘R’= ‘Rtest’
.
Decrease the influence of the penalty term by decreasing ‘µ’, Proceed to
Step 5.
b. ERMS ≥ E (4.10)
Increase the influence of ‘µ’ and repeat Step ‘4’.
Step 5: Process Termination
If the RMS error is not within the desired range, repeat Step 3, else the
training process is ceased. After the successful completion of the training
phase, the sample real time data are given as input of the system. The
99
Table 4.1: Summary of Errors (in %)
Type of ANN
Model
% RMS Error in
Estimation
% RMS Error in
Elimination
NBN Model 7.83 5.15
2HLANN Model 7.23 8.65
Proposed ABI
Learning Model 4.60 4.75
Table 4.2 Comparison of error growth on Proposed model Vs
Existing Models
NO.OF EPOCH NBN 2HLANN PROPOSED
ABI
50 0.175 0.15 0.13
100 0.14 0.11 0.08
150 0.10 0.08 0.05
200 0.075 0.06 0.025
250 0.04 0.025 0.010
100
system will choose the comparatively best path. This thesis used 60%
dataset for training and 40% dataset for testing.
4.5 RESULTS AND ANALYSIS
This ANN based learning model is implemented using Neural
Network Tool Box in MatLab (MatLab). In the training algorithm, the
goal is assigned as “0.01” and the epoch is assigned as 250.
Table ‘4.1’ shows the %RMS error in the Estimation and
Elimination of NBN, 2HLANN and the proposed learning model.
The estimation error identifies a number of documents and our
terms identified in the clustering model. The elimination error defines the
mismatch ratio for document clustering.
Comparisons of RMS error in estimation and elimination for
proposed ABI learning model Vs existing models is shown in Figure
(4.4) and also %error in estimation and elimination for proposed ABI
learning model Vs existing models is shown in Figure (4.5).
The results are shown in Table 4.1 and performance is shown
in Figure ‘4.4’, it is concluded that the performance of proposed ABI
learning model always performs better than the existing methodology.
Figure ‘4.4’ shows the proposed ABI learns the synonymy
better than the existing systems. From this, it is concluded that the
proposed ABI performs better than existing systems. The ABI shows
around 30% improvement in the estimation and around 23%
improvement in the elimination.
101
Figure 4.4 Comparison of % RMS error in Estimation and
Elimination for proposed model vs. existing models
0 50 100 150 200 250 300
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
% E
rror
No.of Epoch
NBN
2HLANN
Proposed ABI
Figure 4.5 %Error in proposed model Vs existing models based on
number of epoch
0
1
2
3
4
5
6
7
8
9
10
NBN Model 2HLANN Model Proposed ABI
Learning
Model
Comparision of % RMS error
% RMS Error in Estimation
% RMS Error in
Elimination
102
The convergence of the proposed ABI and existing learning
models are compared in Figure 4.4. This shows that the proposed ABI
provides optimal results within a few iterations of training.
The percentage RMS error in estimation is reached at 7.83% in
NBN, 7.23% in 2HLANN whereas; it is only 4.60% in the proposed
learning model.
The percentage RMS error in elimination is reached at 5.15%
in NBN, 8.65% in 2HLANNwhereas; it is only 4.75% in the proposed
learning model.
The proposed ABI learning method improved estimation,
elimination and accuracy of the system. The estimation is improved
around 25% than NBN and 33% than 2HLANN. Similarly the
elimination is improved around 30% than NBN and 33% than 2HLANN.
The accuracy of the proposed system also improved which is shown in
the error rate and learning rate based on the epoch.
Figure 4.4 shows the graphical representation of the
performance of proposed and existing models. Figure 4.5 and Table 4.2
shows that the proposed learning model reaches the performance 0.010
in 250 epochs (number of iterations), whereas the existing NBN Model
reaches only 0.04 and 2HLANN reaches only 0.025 respectively, which
is lesser than the proposed system. Therefore, the proposed ABI learning
is more optimal than existing models.