27
76 CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING METHOD 4.1 INTRODUCTION TO LEARNING ALGORITHM Generally in the engineering field of study models have been employed to understand various interactions and these are classified as mathematical models and implementation model. The mathematical model provides a problem definition and problem description in combinatorial mathematical problems. In a network like dynamic environment the mathematical model has some limitations and towards this to the prediction model is proposed. These prediction models selecting from biological neural networks are made up of real biological neurons that are physically connected or functionally-related in the human nervous system and especially in the human brain. ANN on the other hand, is made up of artificial neurons interconnected with one another to form a programming structure that mimics the behaviour and neural processing, organisation and learning of biological neurons. The human brain can perform tasks much faster than the fastest existing computer, thanks to its special ability in massive parallel data processing. ANN tries to mimic such a remarkable behaviour for solving narrowly defined problems, i.e. problems with an associative or cognitive tinge. To this effect, ANN have been extensively and

CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

76

CHAPTER 4

ANALYSIS OF BILATERAL INTELLIGENCE

LEARNING METHOD

4.1 INTRODUCTION TO LEARNING ALGORITHM

Generally in the engineering field of study models have been

employed to understand various interactions and these are classified as

mathematical models and implementation model. The mathematical

model provides a problem definition and problem description in

combinatorial mathematical problems. In a network like dynamic

environment the mathematical model has some limitations and towards

this to the prediction model is proposed.

These prediction models selecting from biological neural

networks are made up of real biological neurons that are physically

connected or functionally-related in the human nervous system and

especially in the human brain. ANN on the other hand, is made up of

artificial neurons interconnected with one another to form a

programming structure that mimics the behaviour and neural processing,

organisation and learning of biological neurons.

The human brain can perform tasks much faster than the

fastest existing computer, thanks to its special ability in massive parallel

data processing. ANN tries to mimic such a remarkable behaviour for

solving narrowly defined problems, i.e. problems with an associative or

cognitive tinge. To this effect, ANN have been extensively and

Page 2: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

77

successfully applied to the pattern (speech/image) recognition, time-

series prediction and modelling, function approximation, classification,

adaptive control and other areas.

Neural networks are made of several processing units called

neurons. Three types of neurons are distinguished: input neurons which

receive data from outside the ANN and are organised in the so called

input layer, output neurons which send data out of the ANN and

generally comprise the output layer, and hidden neurons whose input and

output signals remain within the ANN and form the so called hidden

layer (or layers).

Neurons are communicating with each other by sending

signals over a large number of weighted connections, thus creating a

network with a high degree of interconnection. The neurons are trained

using input–output data sets presented to the network. After the training

process, the network produces appropriate outcomes when tested with

similar data sets, in other words, recognizes the introduced patterns. In

this study, neural networks were preferred not only for their ease of

application but also the yield comparable and even better results than

other methods listed above.

A neural network has to be configured such that the

application of a set of inputs produces the desired set of outputs. Various

methods to set the strengths of the connections exist. One way is to set

the weights explicitly, using a priori knowledge. Another way is to 'train'

the neural network by feeding its teaching patterns and letting it change

its weights according to some learning rule.

Page 3: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

78

Thus, the learning situations are categorized (Fergus et al,

2010) in two distinct sorts. These are:

• Supervised learning or Associative learning in which the

network is trained by providing it with input and matching

output patterns. These input-output pairs can be provided by

an external resource, or by the system which contains the

network (self-supervised).

• Unsupervised learning or Self-organisation in which an output

unit is trained to respond to clusters of pattern within the

input. In this paradigm the system is supposed to discover

statistically salient features of the input population. Unlike the

supervised learning paradigm, there is no a priori set of

categories into which the patterns are to be classified; rather

the system must develop its own representation of the input

stimuli.

Both learning paradigms discussed above result in an

adjustment of the weights of the connections between units, according to

some modification rule.

The basic idea is that if two units ‘j’ and ‘k’ are active

simultaneously, their interconnection must be strengthened. If j receives

input from k, the simplest version of Hebbian learning prescribes to

modify the weight wjk based on equation (4.1),

∆wjk =γ•yj•yk (4.1)

Where, γ is a positive constant of proportionality representing

the learning rate. Another common rule uses not the actual activation of

Page 4: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

79

unit k but the difference between the actual and desired activation for

adjusting the weights based on equation (4.2),

∆wjk =γ•yj (dk- yk) (4.2)

In which, dk is the desired activation. This is often called the

Widrow-Hoff rule or the delta rule.

Suppose there are a set of learning samples consisting of an

input vector x and a desired output d(x). For a classification task the d(x)

is usually +1 or -1. The perception learning rule is very simple and can

be stated as follows:

1. Start with random weights for the connections;

2. Select an input vector x from the set of training samples;

3. If y ≠ d(x) (the perception gives an incorrect response),

modify all connections wi according to the equation (4.3)

∆wi = d(x) • xi (4.3)

4. Go back to 2.

Note that the procedure is very similar to the Hebb rule; the

only difference is that, when the network responds correctly, no

connection weights are modified. Besides modifying the weights, the

system must also modify the thresholdθ.

This θ is considered as a connection w0 between the output

neuron and a 'dummy' predicate unit which is always on: x0 = 1. Given

the perception learning rule as stated above, this threshold is modified

according to equation (4.4),

Page 5: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

80

∆θ = �0 � if the perceptron responds correctly1 � d�x�Otherwise � (4.4)

A perception is initialized with the following weights:

w1 = 1; w2 = 2; θ = -2

The perception learning rule is used to learn a correct

discriminant function for a number of samples. The first sample A, with

values x= (0.5,1.5) and target value d(x) = +1 is presented to the

network. It can be calculated that the network output is +1, so no weights

are adjusted. The same is the case for point B, with values x = (-0.5, 0.5)

and target valued(x) = -1; the network output is negative, so no change.

When presenting point C with values x = (0.5; 0.5) the network output

will be -1, while the target value d(x) = ±1.

According to the perception learning rule, the weight changes

are, w1 = 0.5, w2 = 0.5, θ= 1. The new weights are now, w1 = 1.5,

w2 = 2.5, θ= -1, and sample C is classified correctly.

For the perception learning rule there exist a convergence

theorem, which states the following:

Theorem: If there exists a set of connection weights w* which is able to

perform the transformation y = d(x), the perception learning rule will

converge to some solution (which may or may not be the same as w*) in

a finite number of steps for any initial choice of the weights.

An important generalisation of the perception training

algorithm was presented by Widrow and Hoff as the 'least mean square'

learning procedure, also known as the delta rule. The main functional

difference with the perception training rule is the way the output of the

Page 6: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

81

system is used in the learning rule. The perception learning rule uses the

output of the threshold function (either -1 or +1) for learning. The delta-

rule uses the net output without further mapping into output values

-1 or +1.

The learning rule was applied to the 'adaptive linear element,'

also named Adaline developed by Widrow and Hoff (Pan et al, 2011). In

a simple physical implementation, this device consists of a set of

controllable resistors connected to a circuit which can sum up currents

caused by the input voltage signals.

Figure 4.1 Working model of the Adaline

Usually the central block, the summer, is also followed by a

quantiser which outputs either +1 of -1, depending on the polarity of the

sum. The functionality of the Adaline learning method is shown in

Figure 4.1.

+1

-1

summer gains input

pattern

switches reference

switch

quantizer

output

error

-1 +1

-

level

+1

-1 +1

w1

w2

w3

w0

+

Page 7: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

82

4.2 LEARNING USING ARTIFICIAL NEURAL NETWORK

The evolution of neural network from the human brain has

many desirable characteristics (Chen and Salman, 2011) explored in

computational science which is not available in the traditional von

Neumann architecture or modern parallel computer architecture. The

characteristics are,

• Massive parallelism

• Distributing representation and computation

• Learning ability

• Generalization ability

• Adaptive

• Inherent contextual information processing, and

• Fault tolerance

ANN research has experienced three periods of extensive

activity. The first period was in the 1940s, which started by McCulloch

and Pitts. The second period was in the 1960s, which evolved by

Rosenblatt’s perception convergence theorem, Minsky and Papert’s

review which showing the limitations of a simple perception. This

second revolution attracts many researchers in the field of neural network

for non-stopping 20 years of invention.

The third period of evolution in ANN is from 1980s.

Hopfield’s energy approach, back-propagation learning algorithm,

multilayer perception and continuing research in soft computing is the

well-known examples of the importance of ANN over the periods. Many

Neural network models are designed over the last few decades. The

major algorithm and inventions are marked in Figure 4.2.

Page 8: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

83

Figure 4.2 Kinds of Neural networks and its architecture

Neural Networks

Feed-forward

network

Recurrent / feedback network

Single-

layer

perception

Multilayer

perceptron Radial

Basis

Function

nets

Competitive

network

Kohonen's

SOM

Hopfield

network ART

models

Page 9: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

84

4.3 LERNING ALGORITHMS

There are a variety of learning algorithms applied to improve

the performance of ANN based classification and prediction models. The

following sub-section explains some of the notable learning algorithms.

4.3.1 Error Back Propagation (EBP)

The popular EBP algorithm is relatively simple and it can

handle problems with basically an unlimited number of patterns. Also,

because of its simplicity, it was relatively easy to adopt the EBP

algorithm for more efficient neural network architectures where

connections across layers are allowed.

However, the EBP algorithm can be up to 1000 times slower

than more advanced second-order algorithms. Many improvements have

been made to speed up the EBP algorithm and some of them, such as

momentum and adaptive learning constantly algorithm, work relatively

well. But as long as first-order algorithms are used, improvements are

not dramatic.

This is an EBP algorithm with traditional forward-backward

computation; for EBP algorithm, it may work a little bit faster than

forward-only computation. Now it is only used for standard MLP

networks. EBP algorithm converges slowly, but it can be used for huge

patterns training.

One may notice in the literature that, for almost all cases, very

simple algorithms, such as least mean square or EBP, are used to train

neural networks. These algorithms converge very slowly in comparison

to second-order methods, which converge significantly faster. One

Page 10: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

85

reason why second-order algorithms are seldom used is their complexity

which requires the computation of not only gradients but also Jacobian or

Hessian matrices.

Various methods of neural network training have already been

developed, ranging from the evolutionary computation search through

gradient based methods. The best known method is EBP, but this method

is characterized by very poor convergence. Several improvements for

EBP were developed such as the quick prop algorithm, resilient EBP,

back percolation, and delta-bar-delta, but much better results can be

obtained using second-order methods such as Newton or Levenberg–

Marquardt (LM). In the latter one, not only the gradient but also the

Jacobian matrix must be found.

This above work presents a new neuron-by-neuron (NBN)

method of computing the Jacobian matrix. In the computation Jacobian

matrix can be as simple as the computation of the gradient in the EBP

algorithm. However, more memory is required for the Jacobian. In the

case of a network with the number of training patterns np and the number

of network outputs no, the Jacobian is np × no which is of larger

dimensions than the gradient and therefore requires more memory.

In this sense, the NBN algorithm has the same limitations as

the well-known LM algorithm. For example, in the case of 10 000

patterns and neural networks with 25 weights and 3 outputs, the Jacobian

J will have 30 000 rows and 25 columns, all together having 750 000

elements. However, the matrix inversion must be done only for quasi-

Hessian J × JT of 25 × 25 sizes.

Page 11: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

86

In this above work, Back-propagation is one of the simplest

and most general methods for training of multilayer neural networks. The

power of back-propagation is that it enables us to compute an effective

error for each hidden unit, and thus derive a learning rule for the input-

to-hidden weights. Our goal now is to set the interconnection weights

based on the training patterns and the desired outputs. Slow convergence

speed, is Disadvantages of error back-propagation algorithm.

4.3.2 Levenberg–Marquardt algorithm (LM)

This is a LM algorithm with traditional forward-backward

computation; for LM (and NBN) algorithm, the improved forward-only

computation performs faster training than forward-backward

computation for networks with multiple outputs. Now it is also only used

for standard MLP networks. LM (and NBN) algorithm converges much

faster than the EBP algorithm for small and media sized patterns training.

This work presents a new NBN method of computing the

Jacobian matrix. It is shown that the computation of the Jacobian matrix

can be as simple as the computation of the gradient in the EBP

algorithm; however, more memory is required for the Jacobian. In the

case of a network with the number of training patterns ‘np’ and the

number of network outputs no, the Jacobian is np× no which is of larger

dimensions than the gradient and therefore requires more

4.3.3 Neuron By Neuron (NBN)

The neuron by neuron is an implementation, applied for

nonlinear signal processor in the field of digital signal processing which

is proposed by Wilamowski (2008, 2009). In this model, the traditional

back propagation neural network is improved. The NBN is compared

Page 12: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

87

with existing EBP. The EBP is the most powerful and popular learning

model but it has few pitfalls, 1) slow processing which requires 100-1000

times more iteration and 2) less accuracy.

NBN is proposed by Wilamowski, et al.,(2008) and it is

redefined by Wilamowski, et al.,(2009). Since the development of

EBP—error back propagation—algorithm for training neural networks,

many attempts were made to improve the learning process. There are

some well-known methods like momentum or variable learning rate and

there are less known methods which significantly accelerate learning

rate. The recently developed NBN (neuron-by-neuron) algorithm is very

efficient for neural network training. Comparing with the well known

Levenberg–Marquardt algorithm.

Neuron by Neuron algorithm which is a modification of

the Levenberg Marquet algorithm for arbitrarily connected neurons

ACN. This is a NBN algorithm with forward-backward computation.

NBN algorithm is developed based on LM algorithm, but it can handle

arbitrarily connected neuron networks, also, the convergence is

improved.

The NBN algorithm has several advantages:

(1) The ability to handle arbitrarily connected neural networks;

(2) Forward-only computation (without back propagation process);

and

(3) Direct computation of quasi-Hessian matrix (no need to compute

and store Jacobian matrix).

Page 13: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

88

The row elements of the Jacobian matrix for a given pattern

are being computed in the following three steps

(1) Forward Computation

(2) Backward Computation

(3) Jacobian Element Computation

• Forward Computation : In the forward computation, the

neurons connected to the network inputs are first processed so

that their outputs can be used as inputs to the subsequent

neurons. The neurons are then processed as their input values

become available.

• Backward Computation : The sequence of the backward

computation is opposite to the forward computation sequence.

The process starts with the last neuron and continues toward

the input. The vector δ represents signal propagation from a

network output to the inputs of all other neurons. The size of

this vector is equal to the number of neurons.

• Jacobian Element Computation : After the forward and

backward computation, all the neurons outputs y and vector δ

are calculated. By applying all training patterns, the whole

Jacobian matrix can be calculated and stored.

The NBN algorithm is introduced to solve the structure and

memory limitation in the Levenberg–Marquardt algorithm. Based on the

specially designed NBN routings, the NBN algorithm can be used not

only for traditional MLP networks, but also other arbitrarily connected

neural networks. The NBN algorithm can be organized in two

Page 14: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

89

procedures—with back propagation process and without back

propagation process.

The NBN algorithm does not require to store and to multiply

large Jacobian matrix. As a consequence, the memory requirement for

quasi-Hessian matrix and gradient vector computation is decreased by (P ×

M) times, where P is the number of patterns and M is the number of

outputs. An additional benefit of memory reduction is also a significant

reduction in computation time.

Therefore, the training speed of the NBN algorithm becomes

much faster than the traditional Levenberg–Marquardt algorithm. In the

NBN algorithm, quasi-Hessian matrix can be computed on the fly when

training patterns are applied. Moreover, it has the special advantage for

applications which require dynamically changing the number of training

patterns. There is no need to repeat the entire multiplication of JTJ, but

only add to or subtract from quasi-Hessian matrix. The quasi-Hessian

matrix can be modified as patterns are applied or removed.

4.3.4 Forward-only Computation

The NBN procedure introduced in the earlier section, it

requires both forward and backward computation. Especially, one may

notice that for networks with multiple outputs, the back-propagation

process has to be repeated for each output.

Wilamowski,et al., (2010) is proposed an improved NBN

computation is introduced to overcome the problem, by removing

backpropagation process in the computation of the Jacobian matrix. And

also the method introduced to allow for training arbitrarily connected

Page 15: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

90

neural networks, therefore, more powerful neural network architectures

with connections across layers can be efficiently trained.

The proposed method also simplifies neural network training,

by using the forward-only computation instead of the traditionally used

forward and backward computation. Information needed for the gradient

vector (for first-order algorithms) and Jacobian or Hessian matrix (for

second-order algorithms) is obtained during forward computation.

With the proposed algorithm, it is now possible to solve the

same problems using a much smaller number of neurons because the

proposed algorithm is able to train more complex neural network

architectures that require a smaller number of neurons. Comparable

results of the computation cost show that the proposed forward-only

computation can be faster than the traditional implementation of the

Levenberg–Marquardt algorithm.

4.3.5 Improved Levenberg–Marquardt Algorithm (ILM)

Wilamowski,et al.,(2010) has proposed the ILM to be

improved computation is aimed to optimize the neural networks learning

process using Levenberg–Marquardt (LM) algorithm. Quasi-Hessian

matrix and gradient vector are computed directly, without Jacobian

matrix multiplication and storage. The memory limitation problem for

LM training is solved. Considering the symmetry of quasi-Hessian

matrix, only elements in its upper/lower triangular array need to be

calculated.

Therefore, training speed is improved significantly, not only

because of the smaller array stored in memory, but also the reduced

operations in quasi-Hessian matrix calculation. The improved memory

Page 16: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

91

and time efficiencies are especially true for large sized patterns training.

The improved computation is introduced to increase the training

efficiency of LM algorithm in the above mentioned work.

4.3.6 Two Hidden Layers Artificial Neural Network (2HLANN)

A two hidden layers artificial neural network (2HLANN)

model is proposed by Mkadem, F and Boumaiza, S (2011).It is used for

predicting the dynamic nonlinear characteristics of wideband power

amplifiers. The 2HLANN is an improved model of feed forward neural

network. The 2HLANN is designed in terms of number of neurons,

learning rate and memory space.

4.4 PROPOSED ANALYSIS OF BILATERAL

INTELLIGENCE LEARNING METHOD

Textual pattern mining is one of the major research areas in the

field of data mining. The data mining is anemergingtechnique which

applies many approaches and methods from another field of study and

the data mining is implemented in another area to learn hidden

knowledge. In this proposed work, ANN is used for learning texual

pattern in the Metadata conceptual mining model.

The proposed learning algorithm is called as, analysis of

bilateral intelligence, is used to identify and classify the synonymy of the

sentences. The proposed method provides efficient learning which

identifies patterns which have synonymy and the convergent of the

training algorithm is very fast than existing methodology. From the

results, it is concluded that the performance of proposed ABI is

optimized. Hence, the proposed Metadata conceptual mining model with

ABI learning will provide optimality than existing clustering algorithm.

Page 17: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

92

In order to improve the performance of MCMM, a new

learning method is proposed. ANN is used for learning textual pattern in

the Metadata conceptual mining model. In the proposed ANN based

unsupervised learning is called as, Analysis of Bilateral Intelligence. The

proposed learning algorithm is used to identify and classify the

synonymy of the sentences. It applies the learning process to identify two

equivalent terms (Bilateral) which has the same meaning. It contains text

documents as datasets. Improving accuracy of text clustering is the

required output and it is achieved error free clustering is the goal.

This thesis proposes an effective text clustering methodology.

For text clustering MCMM is proposed which is described in section 3.3.

The performances of algorithms and techniques used in computational

field of domain are improved by means of proper learning method.

Hence, in order to improve the performances of proposed MCMM, a

learning method is proposed.

The proposed learning model involves the learning of

conceptual terms from the MCMM. The terms learned from the proposed

learning algorithm are grouped and added to the STL. The frequent

update of conceptual terms in the STL is more important for effective

clustering. For learning of such terminologies, this proposed work

applies Artificial Neural Network based learning algorithm.

4.4.1 Unsupervised Learning Method

This section explains the learning method for text clustering

proposed in the earlier section. There are many learning methods

proposed in the literature for varying engineering applications

(Tenenbaum et al, 2000). ANN is a better classifier than decision tree

Page 18: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

93

and Bayesian Classifier, it provides higher accuracy. As the volume of

data set increases, the performance of ANN also will increase. It imitates

the neuron structure of animals, bases on the M-P model and Hebb

learning rule. So, in essence, it is a distributed matrix structure.

Through training, the neural network method gradually

calculates, including repeated iteration or cumulative calculation, the

weights of the neuron connected. So at the end of the training process

neural network will provide error free results. The neural network model

can be broadly divided into the following three types: 1) Feed-forward

(FFNN) neural networks, 2) Back-Propagation (BP) network, 3) Self-

organizing networks. At present, the neural network most commonly

used in data mining is BP network.

ANN is a developing science, and some theories such as the

problems of convergence, stability, local minimum and parameter

adjustment have not really taken shape. For the BP network, frequently

arising problems it encounters are that the training is slow, may fall into

local minimum and it is difficult to determine training parameters. To

solve these problems, some persons adopted the method of combining

artificial neural networks and genetic gene algorithms and achieved

noteworthy results.

In the proposed ANN based Unsupervised learning, training

data which contain text are data sets, improving accuracy of text

clustering is the required output and achieving error free clustering is the

goal. The advantages of the proposed approach are: discriminative

training is straightforward; efficient usage of parameters; local optimum

correlation is explicitly modelled; correlations, even higher order

between different features can be exploited without severe distributional

Page 19: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

94

assumptions; highly parallel structures which lead to efficient hardware

implementation.

The architecture of the proposed ANN based unsupervised

learning, training and testing methodologies, the sample data set, and

ratio of training and testing dataset are the important factors for

achieving optimal result in a neural network based learning model. There

isa variation of ANN model available, such as Feed Forward Neural

Network, Back Propagation Neural Network, Hop Field Neural Network,

hybrid neural network, neo-cognition neural network.

The feed forward link artificial neural network is a highly

desirable network model for the researcher due to its simple design, less

hardware cost and relatively high performance (Jasna and Vesna, 2010).

The design of the architecture is more important for the successful

implementation. The artificial neural network has the characteristics of

distributed information storage, parallel processing, information,

reasoning, and self-organized learning, and has the capability of rapid

fitting the non-linear data, so it can solve many problems which are

difficult for other methods to solve.

A major disadvantage of neural networks lies in their

knowledge representation. Acquired knowledge in the form of a network

of units connected by weight links is difficult for humans to interpret.

This factor has motivated research in extracting the knowledge

embedded in training neural networks and in representing that

knowledge symbolically.

Page 20: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

95

4.4.2 Analysis of Bilateral Intelligence (ABI)

The proposed ANN based unsupervised learning, is termed as,

Analysis of Bilateral Intelligence (ABI). The ABI applies the learning

process to identify two equivalent terms which have the same meaning.

ABI contains text documents as datasets, improving accuracy of text

clustering which is the required output and achieving error free clustering

in a shorter time is the goal.

The working model of the proposed ABI Learning method is

explained in the following sections:

The sigmoidal function which shown in equation (4.5) is

applied in the proposed ABI,

A x

1X

1 e−=

+ (4.5)

Where, XA is the output in the hidden and output layer.

Where the inputs are ‘x’ which is connected to the hidden layer from

input layer. The connection has weights ‘rai’, between inputs to hidden

layer. And the output of the neurons referred as ‘sba’ is computational

values between output and hidden layer. Where, ‘b’ neurons in the

output layer, ‘a’ neurons in the hidden layer and ‘i’ neurons in the input

layer. The detailed design diagram of neuron model is shown in Figure

4.3.

Page 21: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

96

Figure 4.3 Design of Neuron Model

Step 1: Initial Phase

The proposed ABI has implemented from well known initial

phase. In the initial phase, the values of the weights are assigned. Let the

values are ‘R’ and ‘S’. ‘R’ is a value of the hidden layer and input layer.

‘S’ is a value of output layer - hidden layer respectively.

The other constants are penalty constant, which is defined as

µ; and the number of iterations, which is called an epoch, is initialized in

the system. The weight vectors ‘R’ and ‘S’ are to be optimized in order

to minimize the error function.

The generalised delta rule is imposed in the proposed ABI,

which involves two stages of operation. In the first stage of operation,

the input ‘x’ is presented and propagated in a forward direction through

the network is to compute the output values ‘y’ for each output unit. This

output is compared with its desired value ‘do’, resulting in an error signal

Page 22: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

97

(the difference between the actual value and the desired value), for each

output unit.

The second stage involves a backward transmission, which

passed through the network after the error was computed. The error

signal is passed to each unit in the network and the appropriate weight

changes are calculated.

Step 2: Weight adjustments Phase

This weight adjustment step is processed based on sigmoid

activation function, shown in the first phase.

The weight of a connection is adjusted by an amount

proportional to the product of an error signal calculated in the second

stage of the first phase.

On the neuron, the unit ‘k’ receiving the input and the output

of the unit ‘j’ is sending this signal along the connection.

Step 3: Optimization of Output Layer Weights

Soptimum = A-1

x B (4.6)

Where

A=∑=

P

p

p

i

p

a ZZ1

a, i =1,…, P (4.7)

B=∑=

P

p

p

b

p

a tZ1

a, b = 1,…, P (4.8)

where, ‘ZP’= scalar output of the hidden neuron of training

data ‘p’, ‘A’ and ‘B’ are output of the hidden layer and output layer

Page 23: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

98

respectively, ‘a’ and ‘b’ are neurons in the hidden layer and output layer,

‘i’ is neuron in the input layer, and ‘t’ is transaction function.

The concept of state is fundamental to this description. The

state vector or simply state, denoted by ‘xk’, is defined as the minimal set

of data that is sufficient to uniquely describe the unforced dynamical

behaviour of the system; the subscript ‘k’ denotes discrete time. In other

words, the state is the least amount of data on the past behaviour of the

system that is needed to predict its future behaviour. Typically, the state

‘xk’ is unknown. To estimate it, use a set of observed data, denoted by

the vector ‘yk’.

Step 4: Test for Completion

RMS error (ERMS) was then calculated comparing the ‘Rtest’

matrix with ‘Soptimum’

matrices calculated in Step 3.

a. ERMS< E (4.9)

The hidden layer weight matrix ‘R’ is updated ‘R’= ‘Rtest’

.

Decrease the influence of the penalty term by decreasing ‘µ’, Proceed to

Step 5.

b. ERMS ≥ E (4.10)

Increase the influence of ‘µ’ and repeat Step ‘4’.

Step 5: Process Termination

If the RMS error is not within the desired range, repeat Step 3, else the

training process is ceased. After the successful completion of the training

phase, the sample real time data are given as input of the system. The

Page 24: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

99

Table 4.1: Summary of Errors (in %)

Type of ANN

Model

% RMS Error in

Estimation

% RMS Error in

Elimination

NBN Model 7.83 5.15

2HLANN Model 7.23 8.65

Proposed ABI

Learning Model 4.60 4.75

Table 4.2 Comparison of error growth on Proposed model Vs

Existing Models

NO.OF EPOCH NBN 2HLANN PROPOSED

ABI

50 0.175 0.15 0.13

100 0.14 0.11 0.08

150 0.10 0.08 0.05

200 0.075 0.06 0.025

250 0.04 0.025 0.010

Page 25: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

100

system will choose the comparatively best path. This thesis used 60%

dataset for training and 40% dataset for testing.

4.5 RESULTS AND ANALYSIS

This ANN based learning model is implemented using Neural

Network Tool Box in MatLab (MatLab). In the training algorithm, the

goal is assigned as “0.01” and the epoch is assigned as 250.

Table ‘4.1’ shows the %RMS error in the Estimation and

Elimination of NBN, 2HLANN and the proposed learning model.

The estimation error identifies a number of documents and our

terms identified in the clustering model. The elimination error defines the

mismatch ratio for document clustering.

Comparisons of RMS error in estimation and elimination for

proposed ABI learning model Vs existing models is shown in Figure

(4.4) and also %error in estimation and elimination for proposed ABI

learning model Vs existing models is shown in Figure (4.5).

The results are shown in Table 4.1 and performance is shown

in Figure ‘4.4’, it is concluded that the performance of proposed ABI

learning model always performs better than the existing methodology.

Figure ‘4.4’ shows the proposed ABI learns the synonymy

better than the existing systems. From this, it is concluded that the

proposed ABI performs better than existing systems. The ABI shows

around 30% improvement in the estimation and around 23%

improvement in the elimination.

Page 26: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

101

Figure 4.4 Comparison of % RMS error in Estimation and

Elimination for proposed model vs. existing models

0 50 100 150 200 250 300

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

% E

rror

No.of Epoch

NBN

2HLANN

Proposed ABI

Figure 4.5 %Error in proposed model Vs existing models based on

number of epoch

0

1

2

3

4

5

6

7

8

9

10

NBN Model 2HLANN Model Proposed ABI

Learning

Model

Comparision of % RMS error

% RMS Error in Estimation

% RMS Error in

Elimination

Page 27: CHAPTER 4 ANALYSIS OF BILATERAL INTELLIGENCE LEARNING …shodhganga.inflibnet.ac.in/bitstream/10603/16044/13... · application of a set of inputs produces the desired set of outputs

102

The convergence of the proposed ABI and existing learning

models are compared in Figure 4.4. This shows that the proposed ABI

provides optimal results within a few iterations of training.

The percentage RMS error in estimation is reached at 7.83% in

NBN, 7.23% in 2HLANN whereas; it is only 4.60% in the proposed

learning model.

The percentage RMS error in elimination is reached at 5.15%

in NBN, 8.65% in 2HLANNwhereas; it is only 4.75% in the proposed

learning model.

The proposed ABI learning method improved estimation,

elimination and accuracy of the system. The estimation is improved

around 25% than NBN and 33% than 2HLANN. Similarly the

elimination is improved around 30% than NBN and 33% than 2HLANN.

The accuracy of the proposed system also improved which is shown in

the error rate and learning rate based on the epoch.

Figure 4.4 shows the graphical representation of the

performance of proposed and existing models. Figure 4.5 and Table 4.2

shows that the proposed learning model reaches the performance 0.010

in 250 epochs (number of iterations), whereas the existing NBN Model

reaches only 0.04 and 2HLANN reaches only 0.025 respectively, which

is lesser than the proposed system. Therefore, the proposed ABI learning

is more optimal than existing models.