Upload
haxuyen
View
213
Download
0
Embed Size (px)
Citation preview
ADAPTIVE NONLINEAR SYSTEM IDENTIFICATION AND CHANNEL EQUALIZATION USING
FUNCTIONAL LINK ARTIFICIAL NEURAL NETWORK
A THESIS SUBMITTED IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
Master of Technology
in
Telematics and Signal Processing
By
AJIT KUMAR SAHOO
Department of Electronics and Communication Engineering
National Institute Of Technology
Rourkela
2007
ADAPTIVE NONLINEAR SYSTEM IDENTIFICATION AND CHANNEL EQUALIZATION USING
FUNCTIONAL LINK ARTIFICIAL NEURAL NETWORK
A THESIS SUBMITTED IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
Master of Technology in
Telematics and Signal Processing
By
AJIT KUMAR SAHOO
Under the Guidance of Prof. G. Panda
Department of Electronics and Communication Engineering
National Institute Of Technology
Rourkela
2007
National Institute Of Technology
Rourkela
CERTIFICATE
This is to certify that the thesis entitled, “Adaptive Nonlinear System Identification and
Channel Equalization Using Functional Link Artificial Neural Network” submitted by
Sri Ajit kumar Sahoo in partial fulfillment of the requirements for the award of Master of
Technology Degree in Electronics & communication Engineering with specialization in
“Telematics and Signal Processing” at the National Institute of Technology, Rourkela
(Deemed University) is an authentic work carried out by him under my supervision and
guidance.
To the best of my knowledge, the matter embodied in the thesis has not been submitted to any
other University / Institute for the award of any Degree or Diploma.
Prof. G. Panda Dept. of Electronics & Communication Engg.
Date: National Institute of Technology Rourkela-769008
ACKNOWLEDGEMENTS
This project is by far the most significant accomplishment in my life and it would be
impossible without people who supported me and believed in me.
I would like to extend my gratitude and my sincere thanks to my honorable, esteemed
supervisor Prof. G. Panda, Head, Department of Electronics and Communication
Engineering. He is not only a great lecturer with deep vision but also most importantly a kind
person. I sincerely thank for his exemplary guidance and encouragement. His trust and
support inspired me in the most important moments of making right decisions and I am glad
to work with him.
I want to thank all my teachers Prof. G.S. Rath, Prof. K. K. Mahapatra, Prof. S.K.
Patra and Prof. S.K. Meher for providing a solid background for my studies and research
thereafter. They have been great sources of inspiration to me and I thank them from the
bottom of my heart.
I would like to thank all my friends and especially my classmates for all the
thoughtful and mind stimulating discussions we had, which prompted us to think beyond the
obvious. I’ve enjoyed their companionship so much during my stay at NIT, Rourkela.
I would like to thank all those who made my stay in Rourkela an unforgettable and
rewarding experience.
Last but not least I would like to thank my parents, who taught me the value of hard
work by their own example. They rendered me enormous support during the whole tenure of
my stay in NIT Rourkela.
Ajit Kumar Sahoo
CONTENTS Page No.
Abstract. i
List of Figures. iii
List of Tables. v
Abbreviations Used. vi
Chapter 1. Introduction.
1.1. Introduction. 1
1.2. Motivation. 1
1.3. Thesis Layout. 3
Chapter 2. Adaptive Modeling and System Identification.
2.1. Introduction. 4
2.2. Adaptive Filter. 5
2.3. Filter Structures. 7
2.4. Application of Adaptive Filters. 8
2.4.1. Direct Modeling. 8
2.4.2. Inverse Modeling. 10
2.5. Gradient Based Adaptive Algorithm. 10
2.5.1. General Form of Adaptive FIR Algorithm. 11
2.5.2. The Mean-Squared Error Cost Function. 11
2.5.3. The Wiener Solution. 12
2.5.4. The Method of Steepest Descent. 13
2.6. Least Mean Square (LMS) Algorithm. 14
2.7. System Identification. 16
2.8. Simulation Results. 17
2.9. Summary. 21
Chapter 3. System Identification Using Artificial Neural Network (ANN).
3.1. Introduction. 22
3.2. Single Neuron Structure. 23
3.2.1. Activation Functions and Bias. 24
3.2.2. Learning Process. 24
3.3. Multilayer Perceptron. 26
3.3.1. Back Propagation Algorithm. 27
3.4. Functional Link ANN (FLANN). 29
3.4.1. Learning Algorithm. 30
3.5. Cascaded FLANN (CFLANN). 32
3.5.1. Learning Algorithm. 32
3.6. Simulation Results. 36
3.7. Summary 42
Chapter 4. Pruning Using Genetic Algorithm (GA).
4.1. Introduction. 43
4.2. Genetic Algorithm. 44
4.2.1. GA Operations. 45
4.2.2. Population Variable. 46
4.2.3. Chromosome Selection. 46
4.2.4. Gene Crossover. 48
4.2.5. Chromosome Mutation. 49
4.3. Parameters of GA. 50
4.4. Pruning Using GA. 51
4.5. Simulation Results. 55
4.6. Summary. 59
Chapter 5. Channel Equalization.
5.1. Introduction. 60
5.2 .Base Band Communication System. 61
5.3. Channel Interference. 61
5.3.1. Multipath Propagation. 62
5.4. Minimum and Nonminimum Phase Channels. 63
5.5. Inter Symbol Interference. 64
5.5.1. Symbol Overlap. 64
5.6. Channel Equalization. 65
5.6.1. Transversal Filter. 67
5.7. Simulation Results. 68
5.8. Summary. 70
Chapter 6. Conclusions.
6.1. Conclusions. 71
6.2. Scope for Future Work. 71
References. 72
Abstract
In system theory, characterization and identification are fundamental problems. When the
plant behavior is completely unknown, it may be characterized using certain model and then,
its identification may be carried out with some artificial neural networks(ANN) like
multilayer perceptron(MLP) or functional link artificial neural network(FLANN) using some
learning rules such as back propagation (BP) algorithm. They offer flexibility, adaptability
and versatility, so that a variety of approaches may be used to meet a specific goal, depending
upon the circumstances and the requirements of the design specifications. The primary aim of
the present thesis is to provide a framework for the systematic design of adaptation laws for
nonlinear system identification and channel equalization. While constructing an artificial
neural network the designer is often faced with the problem of choosing a network of the
right size for the task. The advantages of using a smaller neural network are cheaper cost of
computation and better generalization ability. However, a network which is too small may
never solve the problem, while a larger network may even have the advantage of a faster
learning rate. Thus it makes sense to start with a large network and then reduce its size. For
this reason a Genetic Algorithm (GA) based pruning strategy is reported. GA is based upon
the process of natural selection and does not require error gradient statistics. As a
consequence, a GA is able to find a global error minimum.
Transmission bandwidth is one of the most precious resources in digital communication
systems. Communication channels are usually modeled as band-limited linear finite impulse
response (FIR) filters with low pass frequency response. When the amplitude and the
envelope delay response are not constant within the bandwidth of the filter, the channel
distorts the transmitted signal causing intersymbol interference (ISI). The addition of noise
during propagation also degrades the quality of the received signal. All the signal processing
methods used at the receiver's end to compensate the introduced channel distortion and
recover the transmitted symbols are referred as channel equalization techniques.
When the nonlinearity associated with the system or the channel is more the number of
branches in FLANN increases even some cases give poor performance. To decrease the
number of branches and increase the performance a two stage FLANN called cascaded
FLANN (CFLANN) is proposed.
i
This thesis presents a comprehensive study covering artificial neural network (ANN)
implementation for nonlinear system identification and channel equalization. Three ANN
structures, MLP, FLANN, CFLANN and their conventional gradient-descent training
methods are extensively studied.
Simulation results demonstrate that FLANN and CFLANN methods are directly applicable
for a large class of nonlinear control systems and communication problems.
ii
LIST OF FIGURES
Figure No Figure Title Page No.
Fig.2.1 Type of adaptations 5
Fig.2.2 General Adaptive Filtering 6
Fig.2.3 Structure of an FIR Filter 8
Fig.2.4 Direct Modeling 9
Fig.2.5 Inverse Modeling 10
Fig.2.6 Block diagram of system identification 17
Fig.2.7 Response and MSE plot for linear system using LMS algorithm 18
Fig.2.8-2.11 Response and MSE plot for nonlinear systems using LMS algorithm 19
Fig.3.1 A single neuron structure 23
Fig. 3.2 Structure of multilayer perceptron 26
Fig. 3.3 Neural network using BP algorithm 27
Fig.3.4 Structure of the FLANN model 30
Fig. 3.5 Structure of CFLANN Model. 33
Fig.3.6-3.10 Response comparison between MLP and FLANN 37
Fig.3.11-3.12 Performance comparison between LMS, FLANN and CFLANN 41
Fig.4.1. GA Iteration Cycle 45
Fig.4.2. Biased roulette-wheel for the selection of the mating pool 47
Fig.4.3. Gene crossover 49
Fig.4.4 Mutation operation in GA 50
Fig.4.5. FLANN based identification model showing pruning path 53
iii
Fig.4.6 Bit allocation scheme for pruning and weight updating 54
Fig.4.7. Output plot for static and dynamic systems 58
Fig.5.1. A baseband Communication System 61
Fig.5.2. Impulse Response of a transmitted signal in a channel 62
Fig.5.3. Interaction between two neighboring symbols 65
Fig.5.4. Block diagram of Channel Equalization 66
Fig.5.5. Linear Transversal Filter 67
Fig.5.6. BER plot comparison between LMS, FLANN, CFLANN 69
iv
LIST OF TABLES
Table No. Table Title Page No.
3.1 Common activation functions. 24
4.1 Comparison of computational complexity between FLANN
and pruned FLANN structure for static systems. 56
4.2 Comparison of computational complexity between FLANN
and pruned FLANN structure for dynamic systems. 59
v
ABBREVIATIONS USED
ANN Artificial Neural Network
BGA Binary Coded Genetic Algorithm (BGA)
BP Back Propagation
CFLANN Cascaded Functional Link Artificial Neural Network
DCR Digital Cellular Radio
DSP Digital Signal Processing
FIR Finite Impulse Response
FLANN Functional Link Artificial Neural Network
FPGA Field Programmable Gate Array
GA Genetic Algorithm
IIR Infinite Impulse Response
ISDN Integrated Service Digital Network
ISI Inter Symbol Interference
LAN Local Area Network
LMS Least Mean Square
MLANN Multilayer Artificial Neural Network
MLP Multilayer Perceptron
MLSE Maximum Likelihood Sequence Estimator
MSE Mean Square Error
PPN Polynomial Perceptron Network
vi
1. INTRODUCTION
1.1. INTRODUCTION.
System identification is one of the most important areas in engineering because of its
applicability to a wide range of problems.Mathmatical system theory, which has in the past
few decades evolved into a powerful scientific discipline of wide applicability, deals with
analysis and synthesis of systems. The best developed theory for systems defined by linear
operators using well established techniques based on linear algebra, complex variable theory
and theory of ordinary linear differential equations. Design techniques for dynamical systems
are closely related to their stability properties. Necessary and sufficient conditions for
stability of linear time-invariant systems have been generated over past century, well-known
design methods have been established for such systems. In contrast to this, the stability of
nonlinear systems can be established for the most part only on a system-by-system basis.
In the past few decades major advances have been made in adaptive identification and
control for identifying and controlling linear time-invariant plants with unknown parameters.
The choice of the identifier and the controller structures based on well established results in
linear systems theory. Stable adaptive laws for the adjustment of parameters in these which
assures the global stability of the relevant overall systems are also based on properties of
linear systems as well as stability results that are well known for such systems [1.1].
In recent years, with the growth of internet technologies, high speed and efficient data
transmission over communication channels has gained significant importance. The rapidly
increasing computer communication has necessitated higher speed data transmission over
wide spread network of voice bandwidth channels. In digital communications the symbols are
sent through linearly dispersive mediums such as telephone, cable and wireless. In band
width efficient data transmission systems, the effect of each symbol transmitted over such
time-dispersive channel extends to the neighboring symbol intervals. This distortion caused
by the resulting overlap of received data is called intersymbol interference (ISI) [1.2].
1.2. MOTIVATION
Adaptive filtering has proven to be useful in many contexts such as linear prediction,
channel equalization, noise cancellation, and system identification. The adaptive filter
attempts to iteratively determine an optimal model for the unknown system, or “plant”, based
on some function of the error between the output of the adaptive filter and the output of the
1
plant. The optimal model or solution is attained when this function of the error is
minimized. The adequacy of the resulting model depends on the structure of the adaptive
filter, the algorithm used to update the adaptive filter parameters, and the characteristics of
the input signal.
When the parameters of a physical system are not available or time dependent it is difficult
to obtain the mathematical model of the system. In such situations, the system parameters
should be obtained using a system identification procedure. The purpose of system
identification is to construct a mathematical model of a physical system from input-output.
Studies on linear system identification have been carried out for more than three decades
[1.3]. However, identification of nonlinear systems is a promising research area. Nonlinear
characteristics such as saturation, dead-zone, etc. are inherent in many real systems. In order
to analyze and control such systems, identification of nonlinear system is necessary. Hence,
adaptive nonlinear system identification has become more challenging and received much
attention in recent years [1.4].
High speed data transmission over communication channels is subject to intersymbol
interference (ISI) and noise. The intersymbol interference is usually the result of the restricted
bandwidth allocated to the channel and/or the presence of multipath distortion in the medium
through which the information is transmitted. Equalization is the process which reconstructs
the transmitted data jointly combating the ISI and the noise in the communication link. The
simplest architecture in the class of equalizers making decisions in a symbol–by–symbol
basis is the linear transversal filter. The field of digital data communications has experienced
an explosive growth in recent years and its demand reaches at the peak as additional services
are being added to existing infrastructure. The telephone networks were originally designed
for voice communication but, in recent times, the advances in digital communications using
Integrated Service Digital Network (ISDN), data communications with computers, fax, video
conferencing etc. have pushed the use of these facilities far beyond the scope of their original
intended use. Similarly, introduction of digital cellular radio (DCR) and wireless local area
networks (LAN’s) have stretched the limited available radio spectrum capacity to the limits it
can offer. These advances in digital communications have been made possible by the
effective use of the existing communication channels with aid of signal processing
techniques. Nevertheless these advances on the existing infrastructure have introduced a host
of new unanticipated problems. The conventional LMS algorithm [1.5] fails in case of
nonlinear channels. Hence non-linear channel estimation is a key problem in communication
2
system. Several approaches based on Artificial Neural Network (ANN) have been discussed
recently for estimation of nonlinear channels.
1.3. THESIS LAYOUT
In Chapter 2, adaptive modeling and system identification problem is defined for linear and
nonlinear plants. The conventional LMS algorithm and other gradient based algorithm for
FIR system are derived. Nonlinearity problems are discussed briefly and various methods are
proposed for its solution.
In Chapter 3, the theory, structure and algorithms of various artificial neural networks are
discussed. We focus on Multilayer Perceptron (MLP), Functional Link ANN (FLANN) and
Cascaded Functional Link ANN (CFLANN). We discuss the learning rule in each of the
methods. Simulation results are carried out for comparisons of ANN technique with
conventional LMS method under different nonlinear condition and noise.
Chapter 4 gives an introduction to evolutionary computing technique and discusses in
details about genetic algorithm and its operators. It also discusses various selection schemes
for population and crossover. In this chapter Genetic Algorithm is used for simultaneous
pruning and weight updation for efficient nonlinear system identification.
In Chapter 5, the adaptive channel equalization is defined for and nonlinear channels.
Different kinds of communication channel and inter symbol interference is discussed. The
performance of conventional LMS algorithm based equalizer and other ANN structures such
as FLANN and CFLANN equalizer are compared.
Chapter 6 summarizes the work done in this thesis work and points to possible directions
for future work.
3
2. ADAPTIVE MODELING AND SYSTEM IDENTIFICATION
2.1. INTRODUCTION
Modeling and system identification is a very broad subject, of great importance in the
fields of control system, communications, and signal processing. Modeling is also important
outside the traditional engineering discipline such as social systems, economic systems, or
biological systems. An adaptive filter can be used in modeling that is, imitating the behavior
of physical systems which may be regarded as unknown “black boxes” having one or more
inputs and one or more outputs. The essential and principal property of an adaptive system is its time-varying, self-
adjusting performance. System identification [2.1, 2.2] is the experimental approach to
process modeling. System identification includes the following steps • Experiment design Its purpose is to obtain good experimental data and it includes
the choice of the measured variables and of the character of the input signals.
• Selection of model structure A suitable model structure is chosen using prior
knowledge and trial and error.
• Choice of the criterion to fit: A suitable cost function is chosen, which reflects how
well the model fits the experimental data.
• Parameter estimation An optimization problem is solved to obtain the numerical
values of the model parameters.
• Model validation: The model is tested in order to reveal any inadequacies.
The adaptive systems have following characteristics
1) They can automatically adapt (self-optimize) in the face of changing (non-
stationary) environments and changing system requirements.
2) They can be trained to perform specific filtering and decision making tasks.
3) They can extrapolate a model of behavior to deal with new situations after
trained on a finite and often small number of training signals and patterns.
4) They can repair themselves to a limited extent.
5) They can be described as nonlinear systems with time varying parameters.
The adaptation is of two types
(i) open-loop adaptation
The open-loop adaptive process is shown in Fig.2.1.(a). It involves making measurements
4
of input or environment characteristics, applying this information to a formula or to a
computational algorithm, and using the results to set the adjustments of the adaptive system.
The adaptation of process parameters don’t depend upon the output signal.
Processor
Input signal
Output signal
Other data
Adaptive algorithm
(a)
Processor
Input signal
Output signal
Other data
Adaptive algorithm
Performance calculation
(b)
Fig.2.1. Type of adaptations (a) Open-loop adaptation and (b) Closed-loop adaptation
(ii) closed-loop adaptation
Close-loop adaptation, as shown in Fig. 2.1.(b),on the other hand involves the automatic
experimentation with these adjustments and knowledge of their outcome in order to optimize
a measured system performance. The latter process may be called adaptation by
“performance feedback”. The adaptation of process parameters depends upon the input as
well as output signal.
2.2. ADAPTIVE FILTER
An adaptive filter [2.3, 2.4] is a computational device that attempts to model the
relationship between two signals in real time in an iterative manner. Adaptive filters are
often realized either as a set of program instructions running on an arithmetical
processing device such as a microprocessor or digital signal processing (DSP) chip, or
as a set of logic operations implemented in a field-programmable gate array (FPGA).
However, ignoring any errors introduced by numerical precision effects in these
implementations, the fundamental operation of an adaptive filter can be characterized
independently of the specific physical realization that it takes. For this reason, we
5
shall focus on the mathematical forms of adaptive filters as opposed to their specific
realizations in software or hardware. An adaptive filter is defined by four aspects:
1. The signals being processed by the filter.
2. The structure that defines how the output signal of the filter is computed from its
input signal
3. The parameters within this structure that can be iteratively changed to alter the
filter's input-output relationship
4. The adaptive algorithm that describes how the parameters are adjusted from one
time instant to the next.
By choosing a particular adaptive filter structure, one specifies the number and type
of parameters that can be adjusted. The adaptive algorithm used to update the
parameter values of the system can take on an infinite number of forms and is often
derived as a form of optimization procedure that minimizes an error.
Adaptive Filter
d(n) x(n) Σ
y (n)
e(n)
+
Fig.2.2. General Adaptive Filtering
Fig.2.2. shows a block diagram in which a sample from a digital input signal x(n) is
fed into a device, called an adaptive filter, that computes a corresponding output
signal sample y(n) at time n. For the moment, the structure of the adaptive filter is
not important, except for the fact that it contains adjustable parameters whose
values affect how y(n) is computed. The output signal is compared to a second signal
d(n), called the desired response signal, by subtracting the two samples at time n. This
difference signal, given by
( ) ( ) ( )e n d n y n= − (2.1)
is known as the error signal. The error signal is fed into a procedure which alters or
6
adapts the parameters of the filter from time n to time (n + 1) in a well-defined
manner. As the time index n is incremented, it is hoped that the output of the adaptive
filter becomes a better and better match to the desired response signal through this
adaptation process, such that the magnitude of decreases over time. In the ( )e n
adaptive filtering task, adaptation refers to the method by which the parameters of the
system are changed from time index n to time index (n +1). The number and types of
parameters within this system depend on the computational structure chosen for the
system. We now discuss different filter structures that have been proven useful for
adaptive filtering tasks.
2.3. FILTER STRUCTURES
In general, any system with a finite number of parameters that affect how y(n) is computed
from x(n) could be used for the adaptive filter in Fig. 2.2.. Define the parameter or
coefficient vector W(n)
0 1 -1( ) [ ( ) ( ) . . . ( )]TLW n w n w n w n= (2.2)
where {wi (n)}, 0 < i < L - 1 are the L parameters of the system at time n. With this definition,
we could define a general input-output relationship for the adaptive filter as
( ) ( ( ), ( - ), ( - 2), ..., ( - ), ( ), ( - ),..., ( - ))y n f W n y n l y n y n N x n x n l x n M l= + (2.3)
where f ( ) represents any well-defined linear or nonlinear function and M and N are positive
integers. Implicit in this definition is the fact that the filter is causal, such that future values of
( )x n are not needed to compute. While non-causal filters can be handled in practice by suitably
buffering or storing the input signal samples, we do not consider this possibility.
Although Equation (2.3) is the most general description of an adaptive filter structure, we
are interested in determining the best linear relationship between the input and desired
response signals for many problems. This relationship typically takes the form of a finite-
impulse-response (FIR) or infinite-impulse-response (IIR) filter. Figure2.3. shows the structure
of a direct-form FIR filter, also known as a tapped-delay-line or transversal filter, where z-1
denotes the unit delay element and each wi (n) is a multiplicative gain within the system. In
this case, the parameters in W(n) correspond to the impulse response values of the filter at
time n. We can write the output signal y(n) as
1
0
( ) ( ) ( )L
ii
y n w n x n i−
=
= ∑ − (2.4)
7
(2.5) ( ) ( )TW n X n=
where ( ) [ ( ) ( - 1) ( - )]TX n x n x n x n L l= + denotes the input signal vector and -T
denotes vector transpose. Note that this system requires L multiplies and L - 1 adds to
implement and these computations are easily performed by a processor or circuit so long as L
is not too large and the sampling period for the signals is not too short. It also requires a total
of 2L memory locations to store the L input signal samples and the L coefficient values,
respectively.
2.4. APPLICATION OF ADAPTIVE FILTERS.
Perhaps the most important driving forces behind the developments in adaptive filters
throughout their history have been the wide range of applications in which such systems can be
used. We now discuss the forms of these applications in terms of more-general problem classes
that describe the assumed relationship between d(n) and x(n). Our discussion illustrates the
key issues in selecting an adaptive filter for a particular task.
2.4.1. Direct Modeling (System Identification) In this type of modeling the adaptive model is kept parallel with the unknown plant.
Modeling a single-input, single-output system is illustrated in Fig.2.4..Both the unknown system
and adaptive filter are driven by the same input. The adaptive filter adjusts itself in such a way
that its output is match with that of the unknown system. Upon convergence, the structure and
parameter values of the adaptive system may or may not resemble those of unknown systems, but
the input-output response relationship will match. In this sense, the adaptive system becomes a
model of the unknown plant
wL-1(n)
z-1z-1x(n) z-1
. . . . . .
. . . . . .
. . . . . .
∑
∑
∑
w0(n) w1(n) w2(n)
y(n)
Fig. 2.3. Structure of an FIR Filter
x(n-2) x(n-1) x(n-L+1)
8
Unknown plant d(n)
x(n) Σ
y (n)
e(n) +
Adaptive model
Fig.2.4. Direct Modelling
Let d(n) and y(n) represent the output of the unknown system and adaptive model with
x(n) as its input.
Here, the task of the adaptive filter is to accurately represent the signal d(n) at its output. If
y(n) = d (n), then the adaptive filter has accurately modeled or identified the portion of the
unknown system that is driven by x(n).
Since the model typically chosen for the adaptive filter is a linear filter, the practical goal of
the adaptive filter is to determine the best linear model that describes the input-output
relationship of the unknown system. Such a procedure makes the most sense when the
unknown system is also a linear model of the same structure as the adaptive filter, as it is
possible that y(n) = d(n) for some set of adaptive filter parameters. For ease of discussion, let
the unknown system and the adaptive filter both be FIR filters, such that
( ) ( ) ( )TOPTd n W n X n= (2.6)
where WOPT(n) is an optimum set of filter coefficients for the unknown system at time n. In
this problem formulation, the ideal adaptation procedure would adjust W(n) such that W(n) =
WOPT (n) as n . In practice, the adaptive filter can only adjust W(n) such that y(n) closely ∞→
approximates d(n) over time.
The system identification task is at the heart of numerous adaptive filtering applications. We
list several of these applications here[2.3]
• Plant Identification
• Echo Cancellation for Long-Distance Transmission
• Acoustic Echo Cancellation
• Adaptive Noise Canceling
9
2.4.2. Inverse Modeling
We now consider the general problem of inverse modeling, as shown in Fig.2.5. In this
diagram, a source signals s(n) is fed into a plant that produces the input signal x(n) for the
adaptive filter. The output of the adaptive filter is subtracted from a desired response signal that
is a delayed version of the source signal, such that
( ) ( - )d n s n= Δ (2.7)
where ∆ is a positive integer value. The goal of the adaptive filter is to adjust its
characteristics such that the output signal is an accurate representation of the delayed source
signal.
plant
d(n) s(n)
Σ y (n) + Adaptive filter
(inverse model) Σ
delay
x(n) ++
Plant noise
Fig.2.5. Inverse Modelling
e(n)
2.5. GRADIENT BASED ADAPTIVE ALGORITHM
An adaptive algorithm is a procedure for adjusting the parameters of an adaptive
filter to minimize a cost function chosen for the task at hand. In this section, we
describe the general form of many adaptive FIR filtering algorithms and present a
simple derivation of the LMS adaptive algorithm. In our discussion, we only consider
an adaptive FIR filter structure, such that the output signal y(n) is given by (2.5). Such
systems are currently more popular than adaptive IIR filters because
(1) The input-output stability of the FIR filter structure is guaranteed for any set
of fixed coefficients, and
(2) The algorithms for adjusting the coefficients of FIR filters are simpler in general
than those for adjusting the coefficients of IIR filters.
10
2.5.1. General Form of Adaptive FIR Algorithm
The general form of an adaptive FIR filtering algorithm is
))(),(),(()()()1( nnXneGnnWnW φμ+=+ (2.8)
where G(-) is a particular vector-valued nonlinear function, μ(n) is a step size
parameter, e(n) and X(n) are the error signal and input signal vector, respectively, and
( )nφ is a vector of states that store pertinent information about the characteristics of
the input and error signals and/or the coefficients at previous time instants. In the
simplest algorithms, ( )nφ is not used, and the only information needed to adjust the
coefficients at time n are the error signal, input signal vector, and step size.
The step size is so called because it determines the magnitude of the change or
"step" that is taken by the algorithm in iteratively determining a useful coefficient
vector. Much research effort has been spent characterizing the role that ( )nμ plays in
the performance of adaptive filters in terms of the statistical or frequency
characteristics of the input and desired response signals. Often, success or failure of
an adaptive filtering application depends on how the value of μ(n) is chosen or
calculated to obtain the best performance from the adaptive filter.
2.5.2. The Mean-Squared Error Cost Function
The form of G(-) in (2.8) depends on the cost function chosen for the given
adaptive filtering task. We now consider one particular cost function that yields a
popular adaptive algorithm. Define the mean-squared error (MSE) cost function as
∫∞
∞−
= )())(()(21)( 2 ndenepnenJ nMSE (2.9)
)}({21 2 neE= (2.10)
where pn(e(n)) represents the probability density function of the error at time n and
E{-} is shorthand for the expectation integral on the right-hand side of (2.10). The MSE
cost function is useful for adaptive FIR filters because
• JMSE (n) has a well-defined minimum with respect to the parameters in W(n);
11
• the coefficient values obtained at this minimum are the ones that minimize the power in
the error signal e(n), indicating that y(n) has approached d{n); and
• JMSE is a smooth function of each of the parameters in W(n), such that it is
differentiable with respect to each of the parameters in W(n).
The third point is important in that it enables us to determine both the optimum
coefficient values given knowledge of the statistics of d(n) and x(n) as well as a simple
iterative procedure for adjusting the parameters of an FIR filter.
2.5.3. The Wiener Solution.
For the FIR filter structure, the coefficient values in W(n) that minimize JMSE (n) are
well-defined if the statistics of the input and desired response signals are known. The
formulation of this problem for continuous-time signals and the resulting solution was
first derived by Wiener [2.5]. Hence, this optimum coefficient vector WMSE (n) is often
called the Wiener solution to the adaptive filtering problem. The extension of
Wiener's analysis to the discrete-time case is attributed to Levinson [2.6]. To
determine WMSE (n) we note that the function JMSE(n) in (2.10) is quadratic in the
parameters {wi(n)}, and the function is also differentiable. Thus, we can use a result
from optimization theory that states that the derivatives of a smooth cost function with
respect to each of the parameters is zero at a minimizing point on the cost function
error surface. Thus, WMSE (n) can be found from the solution to the system of
equations
0)()(=
∂∂
nwnJ
i
MSE , (2.11) 10 −≤≤ Li
Taking derivatives of JMSE (n) in (2.10) we obtain
})()()({
)()(
nwneneE
nwnJ
ii
MSE
∂∂
=∂∂ (2.12)
})()()({
nwnyneE
i∂∂
−= (2.13)
(2.14) )}()({ inxneE −−=
12
-1
0 - ( { ( ) ( - )} - { ( - ) ( - )} ( ))
L
jj
E d n x n i E x n i x n j w n=
= ∑ (2.15)
where we have used the definitions of e(n) and of y(n) for the FIR filter structure in
(2.1) and (2.5), respectively, to expand the last result in (2.15). By defining the matrix
RXX(n)(autocorrelation matrix) and vector Pdx(n)(cross correlation matrix) as
{ ( ) ( )}
( ) { ( ). ( )}
TXX
dx
R E X n X nandP n E d n X n
=
= (2.16)
respectively, we can combine (2.11) and (2.15) to obtain the system of equations in
vector form as
(2.17) 0)()()( =− nPnWnR dxMSEXX
where 0 is the zero vector. Thus, so long as the matrix RXX(n) is invertible, the
optimum Wiener solution vector for this problem is
)()()( 1 nPnRnW dxXXMSE−= (2.18)
2.5.4. The Method of Steepest Descent
The method of steepest descent is a celebrated optimization procedure for
minimizing the value of a cost function J(n) with respect to a set of adjustable
parameters W(n). This procedure adjusts each parameter of the system according to
)()()()()1(
nwnJnnwnw
iii ∂
∂−=+ μ (2.19)
In other words, the ith parameter of the system is altered according to the
derivative of the cost function with respect to the ith parameter. Collecting these
equations in vector form, we have
)()()()()1(
nWnJnnWnW
∂∂
−=+ μ (2.20)
where ∂J(n)/∂W(n) is a vector of derivatives dJ(n)/dwi(n).
Substituting these results into (2.19) yields the update equation for W(n) as
))()()()(()()1( nWnRnPnnWnW XXdx −+=+ μ (2.21)
13
However, this steepest descent procedure depends on the statistical quantities
E{d(n)x(n-i)} and E{x(n-i)x(n-j)} contained in Pdx(n) and Rxx(n), respectively. In
practice, we only have measurements of both d(n) and x(n) to be used within the
adaptation procedure. While suitable estimates of the statistical quantities needed for
(2.21) could be determined from the signals x(n) and d{n), we instead develop an
approximate version of the method of steepest descent that depends on the signal
values themselves. This procedure is known as the LMS(least mean square)
algorithm.
2.6. LMS ALGORITHM
The cost function J(n) chosen for the steepest descent algorithm of (2.19) determines
the coefficient solution obtained by the adaptive filter. If the MSE cost function in
(2.10) is chosen, the resulting algorithm depends on the statistics of x(n) and d(n)
because of the expectation operation that defines this cost function. Since we
typically only have measurements of d(n) and of x(n) available to us, we substitute
an alternative cost function that depends only on these measurements. One such cost
function is the least-squares cost function given by
(2.22) ∑=
−=n
k
TLS kXnWkdknJ
0
2))()()()(()( α
where α(n) is a suitable weighting sequence for the terms within the summation.
This cost function, however, is complicated by the fact that it requires numerous
computations to calculate its value as well as its derivatives with respect to each
W(n), although efficient recursive methods for its minimization can be developed.
Alternatively, we can propose the simplified cost function JLMS(n)given by
)(21)( 2 nenJLMS = (2.23)
This cost function can be thought of as an instantaneous estimate of the MSE cost
function, as JMSE(n) = E{JLMS(n)}. Although it might not appear to be useful, the
resulting algorithm obtained when JLMS(n) is used for J(n) in (2.19) is extremely
useful for practical applications. Taking derivatives of JLMS(n) with respect to the
elements of W(n) and substituting the result into (2.19), we obtain the LMS adaptive
algorithm given by
14
)()()()()1( nXnennWnW μ+=+ (2.24)
Equation (2.24) requires only multiplications and additions to implement. In fact,
the number and type of operations needed for the LMS algorithm is nearly the same
as that of the FIR filter structure with fixed coefficient values, which is one of the
reasons for the algorithm's popularity.
The behavior of the LMS algorithm has been widely studied, and numerous results
concerning its adaptation characteristics under different situations have been
developed. For now, we indicate its useful behavior by noting that the solution
obtained by the LMS algorithm near its convergent point is related to the Wiener
solution. In fact, analysis of the LMS algorithm under certain statistical assumptions
about the input and desired response signals show that
{ ( )} ( )lim MSEn
E W n W n→∞
= (2.25)
when the Wiener solution WMSE (n) is a fixed vector. Moreover, the average behavior
of the LMS algorithm is quite similar to that of the steepest descent algorithm in (2.21)
that depends explicitly on the statistics of the input and desired response signals. In
effect, the iterative nature of the LMS coefficient updates is a form of time-averaging
that smoothes the errors in the instantaneous gradient calculations to obtain a more
reasonable estimate of the true gradient.
The problem is that gradient descent is a local optimization technique, which is limited
because it is unable to converge to the global optimum on a multimodal error surface if the
algorithm is not initialized in the basin of attraction of the global optimum.
Several modifications exist for gradient based algorithms in attempt to enable them to
overcome local optima. One approach is to simply add a momentum term [2.3] to the
gradient computation of the gradient descent algorithm to enable it to be more likely to
escape from a local minimum. This approach is only likely to be successful when the
error surface is relatively smooth with minor local minima, or some information can be
inferred about the topology of the surface such that the additional gradient parameters can
be assigned accordingly. Other approaches attempt to transform the error surface to
eliminate or diminish the presence of local minima [2.16], which would ideally result in a
unimodal error surface. The problem with these approaches is that the resulting minimum
transformed error used to update the adaptive filter can be biased from the true minimum
output error and the algorithm may not be able to converge to the desired minimum error
15
condition. These algorithms also tend to be complex, slow to converge, and may not be
guaranteed to emerge from a local minimum. Some work has been done with regard to
removing the bias of equation error LMS [2.7][2.8] and Steiglitz-McBride [2.9] adaptive IIR
filters, which add further complexity with varying degrees of success.
Another approach [2.10], attempts to locate the global optimum by running several LMS
algorithms in parallel, initialized with different initial coefficients. The notion is that a
larger, concurrent sampling of the error surface will increase the likelihood that one process
will be initialized in the global optimum valley. This technique does have potential, but it
is inefficient and may still suffer the fate of a standard gradient technique in that it will be
unable to locate the global optimum. By using a similar congregational scheme, but one in
which information is collectively exchanged between estimates and intelligent randomization
is introduced, structured stochastic algorithms are able to hill-climb out of local minima.
This enables the algorithms to achieve better, more consistent results using a fewer
number of total estimate.
2.7. SYSTEM IDENTIFICATION
System identification concerns with the determination of a system, on the basis of input
output data samples. The identification task is to determine a suitable estimate of finite
dimensional parameters which completely characterize the plant. The selection of the
estimate is based on comparison between the actual output sample and a predicted value on
the basis of input data up to that instant. An adaptive automaton is a system whose structure
is alterable or adjustable in such a way that its behavior or performance improves through
contact with its environment.
Depending upon input-output relation, the identification of systems can have two groups
A. Static System Identification
In this type of identification the output at any instant depends upon the input at that instant.
These systems are described by the algebraic equations. The system is essentially a
memoryless one and mathematically it is represented as y(n) = f [x(n)] where y(n) is the
output at the nth instant corresponding to the input x(n).
B. Dynamic System Identification
In this type of identification the output at any instant depends upon the input at that instant
as well as the past inputs and outputs. Dynamic systems are described by the difference or
differential equations. These systems have memory to store past values and mathematically
16
represented as y(n)=f [x(n), x(n-1),x(n-2)………..y(n-1),y(n-2),……] where y(n) is the output at
the nth instant corresponding to the input x(n).
A system identification structure is shown in Fig.2.6. The model is placed parallel to the
nonlinear plant and same input is given to the plant as well as the model. The impulse
response of the linear segment of the plant is represented by h(n) which is followed by
nonlinearity(NL) associated with it. White Gaussian noise q(n) is added with nonlinear output
accounts for measurement noise. The desired output d(n) is compared with the estimated
output y(n) of the identifier to generate the error e(n) which is used by some adaptive
algorithm for updating the weights of the model. The training of the filter weights is
continued until the error becomes minimum and does not decrease further. At this stage the
correlation between input signal and error signal is minimum. Then the training is stopped
and the weights are stored for testing. For testing purpose new samples are passed through
both the plant and the model and their responses are compared.
2.8. SIMULATION RESULTS
The performance of LMS algorithm is tested for both linear and nonlinear systems. For
identification purpose a tap delay filter with three taps is used. The parameter of the linear
∑
Plant h(n)
Update algorithm
Model
+ N.L.
Noise
+
_ y(n)
d(n)
x(n)
e(n)
a(n) b(n)
q(n) Nonlinear plant
Fig.2.6. Block diagram of system identification
17
part of the plant is h(n)= [0.26 0.93 0.26].The different type of nonlinearity considered here
are
(I) ( ) tanh( ( ))b n a n=
(II) 2( ) ( ) 0.2 ( ) 0.1 ( )b n a n a n a n= + − 3
(III) 3( ) ( ) 0.9 ( )b n a n a n= −
(IV) (2.26) 2 3( ) ( ) 0.2 ( ) 0.1 ( ) 0.5cos( ( ))b n a n a n a n a nπ= + − +
For the simulation the initial parameters of the model taken as zeros. Gaussian noise of
signal to noise ratio (SNR) 30dB was added which accounts for measurement noise. The
input to the plant was taken from a uniformly distributed random signal over the interval
[-0.5, 0.5] .The adaptation is continued for 2000 iterations which is ensembled over 50
iterations. After training filter weights remain fixed. For testing new 20 samples are
generated and pass through the plant as well as model. The mean square error (MSE) and
responses are plotted for the linear and nonlinear systems.
(i)For linear system:
0 500 1000 1500 2000-35
-30
-25
-20
-15
-10
-5
0
no. of iterations
MS
E in
dB
MSE plot
0 5 10 15 20-0.5
0
0.5
no. of samples
repo
nses
response plot
actuallms
(a) (b)
Fig. 2.7. (a) MSE plot ,(b)response plot
18
(ii)For nonlinearity (I)
0 500 1000 1500 2000-35
-30
-25
-20
-15
-10
-5
0
no. of iterations
MS
E in
dB
MSE plot
0 5 10 15 20-0.5
0
0.5
no. of samples
repo
nses
response plot
actuallms
(a) (b)
Fig. 2.8. (a) MSE plot ,(b)response plot
(iii)For nonlinearity (II)
0 500 1000 1500 2000-30
-25
-20
-15
-10
-5
0
no. of iterations
MS
E in
dB
MSE plot
0 5 10 15 20-0.5
0
0.5
no. of samples
repo
nses
response plot
actuallms
(a) (b)
Fig. 2.9. (a) MSE plot ,(b)response plot
19
(iv)For nonlinearity (III)
0 500 1000 1500 2000-25
-20
-15
-10
-5
0
no. of iterations
MS
E in
dB
MSE plot
0 5 10 15 20-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
no. of samples
repo
nses
response plot
actuallms
(a) (b)
Fig. 2.10. (a) MSE plot ,(b)response plot
(v)For nonlinearity (IV)
0 500 1000 1500 2000-4
-3
-2
-1
0
no. of iterations
MS
E in
dB
MSE plot
0 5 10 15 20-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
no. of samples
repo
nses
response plot
actuallms
(a) (b)
Fig. 2.11. (a) MSE plot ,(b)response plot
20
2.9. SUMMARY
Application of adaptive filter and two types of modeling is described in this chapter.
System identification deals with direct modeling. The LMS algorithm is used for system
identification purpose because of its simplicity. From Fig (2.7) to (2.11) it is observed that for
linear system LMS algorithm based model gives best result. As the nonlinearity associated
with the system goes on increasing the LMS based model response deviates from the actual
response. Taking different types of nonlinearity the MSE and responses are plotted. From
Fig.2.11 it is seen that the actual response and the LMS based model response do not match
anywhere. From this we conclude that LMS based models are best for linear systems.
21
3. SYSTEM IDENTIFICATION USING ANN
3.1. INTRODUCTION
Because of nonlinear signal processing and learning capability, Artificial Neural Networks
(ANN’s) have become a powerful tool for many complex applications including functional
approximation, nonlinear system identification and control, pattern recognition and
classification, and optimization. The ANN’s are capable of generating complex mapping
between the input and the output space and thus, arbitrarily complex nonlinear decision
boundaries can be formed by these networks. An artificial neuron basically consists of a
computing element that performs the weighted sum of the input signal and the connecting
weight. The sum is added with the bias or threshold and the resultant signal is then passed
through a non-linear element of tanh(.) type. Each neuron is associated with three parameters
whose learning can be adjusted; these are the connecting weights, the bias and the slope of
the non-linear function. For the structural point of view a neural network(NN) may be single
layer or it may be multi-layer. In multi-layer structure, there is one or many artificial neurons
in each layer and for a practical case there may be a number of layers. Each neuron of the one
layer is connected to each and every neuron of the next layer.
A neural network is a massively parallel distributed processor made up of simple
processing unit, which has a natural propensity for storing experimental knowledge and
making it available for use. It resembles the brain in two types
1. Knowledge is acquired by the network from its environment through a learning
process.
2. Interneuron connection strengths, known as synaptic weights, are used to store the
acquired knowledge.
Artificial Neural Networks (ANN) has emerged as a powerful learning technique to
perform complex tasks in highly nonlinear dynamic environments. Some of the prime
advantages of using ANN models are their ability to learn based on optimization of an
appropriate error function and their excellent performance for approximation of nonlinear
function [3.1]. At present, most of the work on system identification using neural networks
are based on multilayer feed forward neural networks with back propagation learning or more
efficient variations of this algorithm [3.2] ,[3.3].On the otherhand the Functional link
ANN(FLANN) originally proposed by Pao[3.4] is a single layer structure with functionally
mapped inputs. The performance of FLANN for system identification of nonlinear systems
22
has been reported [3.5] in the literature. Patra and Kot [3.6] have used Chebyschev
expansions for nonlinear system identification and have shown that the identification
performance is better than that offered by the multilayer ANN (MLANN) model. Wang and
Chen [3.7] have presented a fully automated recurrent neural network (FARNN) that is
capable of self-structuring its network in a minimal representation with satisfactory
performance for unknown dynamic system identification and control.
3.2. SINGLE NEURON STRUCTURE
In 1958, Rosenblatt demonstrated some practical applications using the perceptron [3.8].
The perceptron is a single level connection of McCulloch-Pitts neurons sometimes called
single-layer feed forward networks. The network is capable of linearly separating the input
vectors into pattern of classes by a hyper plane. A linear associative memory is an example of
a single-layer neural network. In such an application, the network associates an output pattern
(vector) with an input pattern (vector), and information is stored in the network by virtue of
modifications made to the synaptic weights of the network.
The structure of a single neuron is presented in Fig. 3.1.An artificial neuron involves the
computation of the weighted sum of inputs and threshold [3.9, 3.10]. The resultant signal is
then passed through a non-linear activation function. The output of the neuron may be
represented as,
(3.1) ( ) ( ) ( )1
( )N
j jj
y n f w n x n b n=
⎡ ⎤= ⎢
⎣ ⎦∑
∑ f(.)
• • •
x1
x2
xN
b(n)
y(n)
Fig. 3.1. A single Neuron
w2
w1
wN
+ ⎥
Where b(n) = threshold to the neuron is called as bias.
wj(n) = weight associated with the jth input, and N = no. of inputs to the neuron.
23
3.2.1. Activation Functions and Bias.
The perceptron internal sum of the inputs is passed through an activation function, which
can be any monotonic function. Linear functions can be used but these will not contribute to a
non-linear transformation within a layered structure, which defeats the purpose of using a
neural filter implementation. A function that limits the amplitude range and limits the output
strength of each perceptron of a layered network to a defined range in a non-linear manner
will contribute to a nonlinear transformation. There are many forms of activation functions,
which are selected according to the specific problem. All the neural network architectures
employ the activation function [3.1, 3.8] which defines as the output of a neuron in terms of
the activity level at its input (ranges from -1 to 1 or 0 to 1). Table 3.1 summarizes the basic
types of activation functions. The most practical activation functions are the sigmoid and the
hyperbolic tangent functions. This is because they are differentiable.
The bias gives the network an extra variable and the networks with bias are more powerful
than those of without bias. The neuron without a bias always gives a net input of zero to the
activation function when the network inputs are zero. This may not be desirable and can be
avoided by the use of a bias.
Table 3.1 COMMON ACTIVATION FUNCTIONS
Name Definition
Linear ( )f x kx=
Step ( ) ,
,f x if x
if x kkβ
δ= ≥= <
Sigmoid 1( ) , 01 xf x
e α α−= >+
Hyperbolic Tangent 1( ) tanh( ) , 01
x
x
ef x xe
γ
γγ γ−
−
−= = >
+
Gaussian 2
22
1 (( ) exp22
xf x μσπσ
)⎡ ⎤−= −⎢ ⎥
⎣ ⎦
3.2.2 Learning Processes
The property that is of primary significance for a neural network is that the ability of the
network to learn from its environment, and to improve its performance through learning. The
improvement in performance takes place over time in accordance with some prescribed
24
measure. A neural network learns about its environment through an interactive process of
adjustments applied to its synaptic weights and bias levels. Ideally, the network becomes
more knowledgeable about its environment after each iteration of learning process. Hence we
define learning as:
“It is a process by which the free parameters of a neural network are adapted through a
process of stimulation by the environment in which the network is embedded.”
The processes used are classified into two categories as described in [3.1]:
(A) Supervised Learning (Learning With a Teacher)
(B) Unsupervised Learning (Learning Without a Teacher)
(A) Supervised Learning:
We may think of the teacher as having knowledge of the environment, with that knowledge
being represented by a set of input-output examples. The environment is, however unknown
to neural network of interest. Suppose now the teacher and the neural network are both
exposed to a training vector, by virtue of built-in knowledge, the teacher is able to provide the
neural network with a desired response for that training vector. Hence the desired response
represents the optimum action to be performed by the neural network. The network
parameters such as the weights and the thresholds are chosen arbitrarily and are updated
during the training procedure to minimize the difference between the desired and the
estimated signal. This updation is carried out iteratively in a step-by-step procedure with the
aim of eventually making the neural network emulate the teacher. In this way knowledge of
the environment available to the teacher is transferred to the neural network. When this
condition is reached, we may then dispense with the teacher and let the neural network deal
with the environment completely by itself. This is the form of supervised learning.
The update equations for weights are derived as LMS [3.11]:
( )1 ( ) (j jw n w n w nμ+ = + Δ )j (3.2)
( )jw nΔ is the change in wj in nth iteration.
(B) Unsupervised Learning:
In unsupervised learning or self-supervised learning there is no teacher to over-see the
learning process, rather provision is made for a task independent measure of the quantity of
representation that the network is required to learn, and the free parameters of the network are
optimized with respect to that measure. Once the network has become turned to the statistical
regularities of the input data, it develops the ability to form the internal representations for
25
encoding features of the input and thereby to create new classes automatically. In this
learning the weights and biases are updated in response to network input only. There are no
desired outputs available. Most of these algorithms perform some kind of clustering
operation. They learn to categorize the input patterns into some classes.
3.3. MULTILAYER PERCEPTRON
In the multilayer neural network or multilayer perceptron (MLP), the input signal
propagates through the network in a forward direction, on a layer-by-layer basis. This
network has been applied successfully to solve some difficult and diverse problems by
training in a supervised manner with a highly popular algorithm known as the error back-
propagation algorithm [3.1,3.9]. The scheme of MLP using four layers is shown in Fig.3.2.
( )ix n represent the input to the network, jf and kf represent the output of the two hidden
layers and ( )ly n represents the output of the final layer of the neural network. The
connecting weights between the input to the first hidden layer, first to second hidden layer
and the second hidden layer to the output layers are represented by
respectively.
, and ij jk klw w w
Fig. 3.2 Structure of multilayer perceptron
•••
••• ••
•
First Hidden layer
(Layer-2)
Input layer
(Layer-1)
Output layer
(Layer-4)
Input Signal,
( )ix n
Output Signal
( )ly n
Second Hidden
layer (Layer-3)
+1 +1ijw +1jkw
klw
If P1 is the number of neurons in the first hidden layer, each element of the output vector of
first hidden layer may be calculated as,
( )1
j j ij i ji
N
f w x n bϕ=
⎡ ⎤= +⎢ ⎥
⎣ ⎦∑ 11, 2,3,... , 1, 2,3,...i N j= = P (3.3)
26
where jb is the threshold to the neurons of the first hidden layer, N is the no. of inputs and
( ).ϕ is the nonlinear activation function in the first hidden layer chosen from the Table 3.1.
The time index n has been dropped to make the equations simpler. Let P2 be the number of
neurons in the second hidden layer. The output of this layer is represented as, kf and may be
written as
1
1k k jk j k
j
P
f w f bϕ=
⎡ ⎤= ⎢
⎣ ⎦∑ + ⎥
P
l+ ⎥
, k=1, 2, 3, …, P2 (3.4)
where, is the threshold to the neurons of the second hidden layer. The output of the final
output layer can be calculated as
kb
( )2
1l l kl k
ky n w f bϕ
=
⎡ ⎤= ⎢
⎣ ⎦∑ , l=1, 2, 3, … , P3 (3.5)
where, lα is the threshold to the neuron of the final layer and P3 is the no. of neurons in the
output layer. The output of the MLP may be expressed as
( ) ( )2 1
1 1 1
N
l n kl k jk j ij i j kk j i
y n w w w x n b b bϕ ϕ ϕ= = =
⎡ ⎤⎛ ⎞⎧ ⎫= ⎢ ⎥⎨ ⎬⎜
⎩ ⎭⎢ ⎥⎝ ⎠⎣ ⎦∑ ∑ ∑P P
l+ + +⎟ (3.6)
3.3.1. Backpropagation Algorithm.
Fig. 3.3 Neural network using BP algorithm
ΣBack-Propagation
Algorithm
x1
x2
( )ly n
( )d n ( )le n
-+
An MLP network with 2-3-2-1 neurons (2, 3, 2 and 1 denote the number of neurons in the
input layer, the first hidden layer, the second hidden layer and the output layer respectively)
with the back-propagation (BP) learning algorithm, is depicted in Fig.3.3. The parameters of
27
the neural network can be updated in both sequential and batch mode of operation. In BP
algorithm, initially the weights and the thresholds are initialized as very small random values.
The intermediate and the final outputs of the MLP are calculated by using (3.3), (3.4.), and
(3.5.) respectively.
( The final output )ly n at the output of neuron l , is compared with the desired
output and the resulting error signal ( )d n ( )
l
le n is obtained as
( ) ( ) ( )le n d n y n= − (3.7)
The instantaneous value of the total error energy is obtained by summing all error signals
over all neurons in the output layer, that is
( ) ( )3
2
1
12 l
l
n eξ=
= ∑P
n (3.8)
where P3 is the no. of neurons in the output layer.
This error signal is used to update the weights and thresholds of the hidden layers as well as
the output layer. The reflected error components at each of the hidden layers is computed
using the errors of the last layer and the connecting weights between the hidden and the last
layer and error obtained at this stage is used to update the weights between the input and the
hidden layer. The thresholds are also updated in a similar manner as that of the corresponding
connecting weights. The weights and the thresholds are updated in an iterative method until
the error signal becomes minimum. For measuring the degree of matching, the Mean Square
Error (MSE) is taken as a performance measurement.
The updated weights are,
( ) ( ) ( )1+ = +Δkl kl klw n w n w n (3.9)
( ) ( ) ( )1+ = + Δjk jk jkw n w n w n (3.10)
( ) ( ) ( )1+ = + Δij ij ijw n w n w n (3.11)
where, ( ) ( ) ( ), and Δ Δ Δkl jk ijw n w n w n are the change in weights of the second hidden
layer-to-output layer, first hidden layer-to-second hidden layer and input layer-to-first hidden
layer respectively. That is,
28
( ) ( )
( ) ( ) ( )( )
( )2
1
2 lkl
kl kl
P
l kl k lk
d n dy nw n e n
dw n dw n
e n w f f
ξμ μ
μ ϕ α=
Δ = − =
⎡ ⎤′= +⎢ ⎥⎣ ⎦∑ k
(3.12)
Where, μ is the convergence coefficient ( 0 1μ≤ ≤ ). Similarly the can
be computed [3.1].
( ) ( )
( ) ( ) ( )l
( ) ( ) ( )k
j
( ) ( ) ( )
( )
and Δ Δjk ijw n w n
The thresholds of each layer can be updated in a similar manner, i.e.
(3.13) 1l lb n b n b n+ = + Δ
(3.14) 1k kb n b n b n+ = + Δ
(3.15) ( ) ( ) ( )1j jb n b n b n+ = + Δ
where, are the change in thresholds of the output, hidden and
input layer respectively. The change in threshold is represented as,
, and l k jb n b n b nΔ Δ Δ
( ) ( ) ( ) ( )( )
( )2
1
2 ll
l
P
l kl k lk
d n dy nb n e n
db n db n
e n w f b
ξμ μ
μ ϕ=
Δ = − =
⎡ ⎤′= +⎢ ⎥⎣ ⎦∑
l (3.16)
3.4. FUNCTIONAL LINK ANN
Pao originally proposed FLANN and it is a novel single layer ANN structure capable of
forming arbitrarily complex decision regions by generating nonlinear decision boundaries
[3.4]. Here, the initial representation of a pattern is enhanced by using nonlinear function and
thus the pattern dimension space is increased. The functional link acts on an element of a
pattern or entire pattern itself by generating a set of linearly independent function and then
evaluates these functions with the pattern as the argument. Hence separation of the patterns
becomes possible in the enhanced space. The use of FLANN not only increases the learning
rate but also has less computational complexity [3.13]. Pao et al [3.12] have investigated the
learning and generalization characteristics of a random vector FLANN and compared with
those attainable with MLP structure trained with back propagation algorithm by taking few
functional approximation problems. A FLANN structure with two inputs is shown in Fig. 3.4.
29
3.4.1. Learning Algorithm.
Let X is the input vector of size N×1 which represents N number of elements; the kth
element is given by:
(3.17)
Each element undergoes nonlinear expansion to form M elements such that the resultant
matrix has the dimension of N×M.
( ) ,1kk x k N= ≤ ≤X
The functional expansion of the element kx by power series expansion is carried out using
the equation given in (3.18)
ki l
k
xs
x⎧
= ⎨⎩
for 1 for 2,3, 4, ,
ii M== …
(3.18)
where . 1, 2, ,l M=
For trigonometric expansion, the
(3.19) ( )( )
sin
cos
k
i k
k
x
s l xfor 1 for 2,4, , for 3,5, , +1
ii Mi M
===
……l x
π
π
⎧⎪⎪= ⎨⎪⎪⎩
Where 1,2, , 2l = M . In matrix notation the expanded elements of the input vector E, is
denoted by S of size N×(M+1).
The bias input is unity. So an extra unity value is padded with the S matrix and the
dimension of the S matrix becomes N×Q, where ( )2Q M= + .
Fig.3.4 Structure of the FLANN model
Func
tiona
l Exp
ansi
on
.
.
.
∑ ∑
Adaptive Algorithm
S W
1
y(n)
d(n)
e(n)
x2
x1 +_
30
Let the weight vector is represented as W having Q elements. The output is given as y
(3.20) 1
Q
i ii
y s=
= ∑ w
In matrix notation the output can be,
(3. 21) T= ⋅Y S W
At nth iteration the error signal ( )e n can be computed as
(3.22) ( ) ( ) ( )e n d n y n= −
Let denotes the cost function at iteration k and is given by ( )nξ
( ) ( )2
1
12
P
jj
n eξ=
= ∑ n (3.23)
where P is the number of nodes at the output layer.The weight vector can be updated by least
mean square (LMS) algorithm, as
ˆ( 1) ( ) (2
w k w k k)μ+ = − ∇ (3.24)
where is an instantaneous estimate of the gradient of ˆ ( )n∇ ( )nξ with respect to the weight
vector Now ( )w n
( ) [ ( ) ( )]ˆ ( ) 2 ( ) 2 ( )
2 ( ) ( )
y n w n s nn e n e nw w we n s n
∇ = = − = −∂ ∂ ∂
= −
ξ∂ ∂ ∂
ˆ
( ) ( )
(3.25)
Substituting the values of in (3.24) we get ( )n∇
( ) ( )1w n w n e n s nμ+ = + (3.26)
where μ denotes the step-size , which controls the convergence speed of the LMS
algorithm.
(0 μ≤ ≤ )1
The functions used for Functional Expansion is linearly independent and this may be
achieved by the use of suitable orthogonal polynomials for functional expansion. The
examples of which include Legendre, Chebyshev and trigonometric polynomials. Some of
the advantages of using trigonometric polynomials for use in the functional expansion are
explained below. Of all the polynomials of Nth order with respect to an orthonormal system
31
{ } 1( ) N
i iuφ
= the best approximation in the metric space is given by the Nth partial sum of its
Fourier series with respect this system Thus, the trigonometric polynomial basis functions
given by {
2L
}1,cos( ),sin( ),cos(2 ),sin(2 ),...cos( ),sin( )u u u u N u N uπ π π π π π provides a compact
representation of the function in the mean square sense. However, when the outer product
terms were used along with the trigonometric polynomials for functional expansion, better
results were obtained in the case of learning of a two-variable function .
3.5. CASCADED FUNCTIONAL LINK ANN (CFLANN)
For the identification of highly nonlinear systems the number of branches in the FLANN
increases. Even some cases give poor performance. To decrease the number of branches and
increase the performance a two-stage FLANN is proposed. Here the output of the first stage
again undergoes functional expansion. The weights of cascaded FLANN are updated by
using BP algorithm.
3.5.1. Learning Algorithm.
A two stage CFLANN structure is shown in Fig.(3.5). x(n-1) and x(n-2) are the one unit
time delay and two unit time delay of input signal x(n).Here each term is expanded into three
terms in the first stage.y2(n) which is the output of first stage is again expanded into three
terms. The number of expansion depends upon the nonlinearity associated with the system.
For highly nonlinear system the number of expansion is more. For mathematical simplicity
here we have considered that each term is expanded into three terms. This can be extended
into any number of terms.
In the Fig.3.5. are the weights of the first stage. )(...).........(),(,)( 9221 nwnwnwnw
)(),(,)( 321 nhnhnh are the weights of the second stage.
))]2(cos())2(sin()2())1(cos())1(sin()1())(cos())(sin()([))((
−−−−−−=
nxnxnxnxnxnxnxnxnxnx
ππππππφ
(3.27)
(3.28) )](......).........()()([)( 9221 nwnwnwnwnW =
(3.29) ))](cos())(sin()([))(( 2222 nynynyny ππψ =
)]()()([)( 321 nhnhnhnH = (3.30)
Here f1(.) and f2(.) are taken as tanh(.).
The reason why we choose the hyperbolic tangent function in the output is twofold. First,
the function has to be differentiable when using a BP algorithm to train the network. This
32
f 2( )
d(n)
∑
y 4(n
)
w1
x(n)
cos(πx
(n))
(n)
w3(
n)
w2(
n)
sin(πx
(n))
z-1
z-1
∑
+1
x(n)
x(n-
2)
sin(πx
(n-2
))
cos(πx
(n-2
)) w
7(n)
w9(
n)
w8(
n)
x(n-
2)
x(n-
1)
sin(πx
(n-1
))
cos(πx
(n-1
)) w
4(n)
w6(
n)
w5(
n)
x(n-
1)
x(n)
∑
y 1(n
) f 1(
) y 2
(n)
y 2(n
)
cos(πy
2(n)
)
sin(πy
2(n)
)
h 3(n
)
h 2(n
)
h 1(n
) b 2
(n)
+1
y 3(n
)
Bac
kpra
gatio
n A
lgor
ithm
b 1(n
)
e(n)
+-
Firs
t Sta
ge
Seco
nd S
tage
Fig.
3.5
. Stru
ctur
e of
CFL
AN
N
33
ensures the possibility of calculating the gradients of the error functions (also called the
Performance function). And secondly, one wants to choose a nonlinear function that is close
to a binary-valued one to increase the speed of convergence. It turns out that tanh( .) is good
choice.
Weight updation in the second stage
))(tanh()( 34 nyny = (3.31)
))](cos()())(sin()()()([)( 2322213 nynhnynhnynhny ππ ++= (3.32)
error are nth iteration (3.33) )()()( 4 nyndne −=
where is the desired response. ( )d n
Cost function )(21)( 2 nen =ξ (3.34)
We will drop the time index (n) for simplicity
The change in weight is proportional to 1h1h∂
∂ξ
1
11 hhh
∂∂
−=ξη (3.35)
The use of minus sign in equation accounts for gradient descent in weight space
By using chain rule we can write
34
1 4 3
yyeh e y y h1
ξ ξ ∂∂∂ ∂ ∂=
∂ ∂ ∂ ∂ ∂ (3.36)
From equation (3.34) eeξ∂=
∂ (3.37)
From equation (3.33) 4
1ey∂
= −∂
(3.38)
From equation (3.31) 244
3
1y yy∂
= −∂
(3.39)
From equation (3.32) 32
1
y yh∂
=∂
(3.40)
Now equation (3.36) becomes 24 2
1
(1 )e y yhξ∂= − −
∂ (3.41)
From equation (3.35) and (3.41)
(3.42) 22411 )1( yyehh −+= η
34
Similarly proceeding as above we can get
(3.43) )sin()1( 22422 yyehh πη −+=
(3.44) )cos()1( 22433 yyehh πη −+=
In general
(3.45) TTT yyeHH )()1( 224 ψη −+=
Weight updation in the first stage
)tanh( 12 yy = (3.46)
)]cos(........).........cos()sin([ 391312111 xwxwxwxwy πππ +++= (3.47)
The change in weight is proportional to 1w1w∂
∂ξ
1
11 www
∂∂
−=ξη (3.48)
The use of minus sign in equation accounts for gradient descent in weight space
By using chain rule we can write
34 2
1 4 3 2 1 1
yy yew e y y y y w
1yξ ξ ∂∂ ∂ ∂∂ ∂ ∂=
∂ ∂ ∂ ∂ ∂ ∂ ∂ (3.49)
From equation (3.32) )sin()cos( 232212
3 yhyhhyy
ππππ −+=∂∂
(3.50)
From equation (3.46) )1( 22
1
2 yyy
−=∂∂
(3.51)
From equation (3.47) 11
1 xwy
=∂∂
(3.52)
Now equation (3.49) becomes
12223221
24
1
)1))(sin()cos()(1( xyyhyhhyew
−−+−−=∂∂ ππππξ (3.53)
From equation (3.48) and (3.53)
12223221
2411 )1))(sin()cos()(1( xyyhyhhyeww −−+−+= ππππη (3.54)
Let )sin()cos( 23221 yhyhhbp ππππ −+=
122
2411 )1()1( xybpyeww −−+= η (3.55)
Similarly proceeding as above we can get
)sin()1()1( 122
2422 xybpyeww πη −−+=
35
.
.
.
)cos()1()1( 322
2499 xybpyeww πη −−+=
TTT xybpyeWW )()1()1( 22
24 φη −−+= (3.56)
In the similar way by taking different functions for f1(.) and f2(.) and taking different
number of expansions the weight updation algorithm can be derived.
3.6. SIMULATION RESULTS
In this section different types of ANN models and systems are considered for comparison of
their performances.
(A)Comparison between MLP and FLANN
Example 1:Here, different nonlinear systems are chosen to examine the approximation
capabilities of the MLP and the FLANN. The structure considered for simulation purpose is
shown in Fig.2.4.The unknown plant is described by some nonlinear function given in
equation (3.57).The adaptive system is either MLP or FLANN. A three-layer MLP structure
with 20 and 10 nodes (excluding the threshold unit) in the first and second layers respectively
and one input node and one output node was chosen for the purpose of identification.. Where
as, the FLANN structure has 14 number of input nodes. Thus, it has only 15 weights
including the threshold unit, which are to be updated in one iteration.
3 2
1
23 2
3 5 4 3 2
34 3
( ) ( ) 0.3 0.4 ,( ) ( ) 0.6sin( ) 0.3sin(3 ) 0.1sin(5 ),
4 1.2 1.2( ) ( ) ,0.4 0.8 1.2 0.2 3
2( ) ( ) 0.5sin ( ) 0.1cos(4 ) 1.1252
a f u u u ub f u u u u
u uc f uu u u u
d f u u uu
π π π
π π
= + −= + +
− +=
+ − + −
= − − ++
(3.57)
The input pattern was expanded by using trigonometric polynomials, i.e., by using cos( )n uπ
and sin( )n uπ , for n = 0,1,2; ….. .In some cases, the cross product terms were also included
in the functional expansion. The nonlinearity used in a node of the MLP and the FLANN is
the function. The BP algorithm was used to adapt the weights of both the ANN
structures. The input u was a random signal drawn from a uniform distribution in the interval
[-1, 1]. Both the convergence parameter and the momentum term were set to 0.1. Both the
()tahh
36
MLP and the FLANN were trained for 30000 iterations, after which the weights of the ANN
were stored for testing. For testing the input signal is an uniform distribution in the interval
[-1 1]. The response is given in Fig.3.6. to 3.8.
(i)For nonlinearity (3.57(a))
-1 -0.5 0 0.5 1-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
input----->
outp
ut---
->
trueestimated
-1 -0.5 0 0.5 1-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
input----->
outp
ut---
->
trueestimated
(a) (b) Fig.3.6. Output plot (Example 1) (a) f1 using MLP (b) f1 using FLANN
(ii) For nonlinearity (3.57(b))
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1
input----->
outp
ut---
->
trueestimated
-1 -0.5 0 0.5 1-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
input----->
outp
ut---
->
trueestimated
(a) (b) Fig.3.7. Output plot (Example 1) (a) f2 using MLP (b) f2 using FLANN
37
(iii) For nonlinearity (3.57(c))
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1
input----->
outp
ut---
->
trueestimated
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1
input----->ou
tput
---->
trueestimated
(a) (b)
Fig.3.8. Output plot (Example 1) (a) f3 using MLP (b) f3 using FLANN Example 2: The plant is assumed to be of second order and is described by the following difference
equation.
(3.58) ( 1) 0.3 ( ) 0.6 ( 1) [ ( )]y n y n y n g u n+ = + − +
The unknown function g was taken from the nonlinear functions of (3.57).
To identify the plant a model was considered which is governed by the difference equation
ˆ( 1) 0.3 ( ) 0.6 ( 1) [ ( )]y n y n y n N u n+ = + − + (3.59)
The MLP and FLANN structure are same as Example1.N [u(n)] in equation (3.59) represents the
neural network if the input is u(n) . The input to the plant was taken from an uniformly distributed
random signal over the interval [-1,1]. The adaptation continues for 30000 iterations during which
the series-parallel scheme of identification was used. Then, the adaptation was stopped and the
network was used for testing for identification using the parallel scheme. The testing of the
network models was undertaken by presenting a sinusoidal input to the identified model given by
2sin 250250
( )2 20.8sin 0.2sin 250250 25
n for nu n
n n for n
π
π π
⎧ ⎛ ⎞ ≤⎜ ⎟⎪⎪ ⎝ ⎠= ⎨⎛ ⎞ ⎛ ⎞⎪ + >⎜ ⎟ ⎜ ⎟⎪ ⎝ ⎠ ⎝ ⎠⎩
(3.60)
The results of identification (3.58) with nonlinear function 3f and 4f of (3.57) are shown in
Fig.3.9 and Fig.3.10 respectively.
38
(i) For nonlinearity (3.57(c))
(a) (b)
i) For nonlinearity (3.57(d))
ple 2) (a) f4 using MLP (b) f4 using FLANN
0 200 400 600-8
-6
-4
-2
0
2
4
6
no. of iterationou
tput
plantmodel
0 200 400 600-8
-6
-4
-2
0
2
4
6
no. of iteration
outp
ut
plantmodel
Fig.3.9. Output plot (Example 2) (a) f3 using MLP (b) f3 using FLANN
(i
Fig.3.10. Output plot (Exam
0 200 400 600-8
-6
-4
-2
0
2
4
66
no. of iteration
outp
ut
plantmodel
(a) (b)
4
2
0
-2
-4
0 200 400 600-8
-6
no. of iteration
outp
ut
plantmodel
39
(B)Comparison between FLANN and CFLANN
Extensive simulation studies were carried out with several examples of nonlinear systems.
We have compared the performance of the CFLANN (two stages) structure with that of
single layer FLANN structure and LMS based identifiers. The input to the system was taken
from a uniformly distributed random signals over interval [-0.5, 0.5].White Gaussian noise
was added to the output of the nonlinear system. The convergence factor is chosen as
0.03.The adaptation continues for 5000 iterations during parallel scheme of identification was
used. The adaptation is then stopped and the network is used for identification purpose. The
testing of the network model was done by applying new random samples to the input.
Example 3: We consider a system described by transfer function
The nonlinearity used is
For the identification the structure considered is given in Fig.2.6.The adaptive models
considered here are FIR filter, FLANN and CFLANN. In FLANN structure each branch is
expanded into nine branches out of which one is direct input and other eight are trigonometric
expansions such as sin(πx),cos(πx),sin(2πx),cos(2πx)………… sin(4πx),cos(4πx) ,if x is the
input to the identifier. The total number of weights used in this case is 28(including a bias
term) . A white Gaussian noise of -30dB is added to the output of the nonlinear system.
In case of CFLANN in the first stage each branch is expanded into five branches by using
trigonometric expansions. The output of the first stage is again expanded into three branches.
The total number of weights used is 20(including two bias term, one at each stage).The mean
square error (MSE) and output responses are compared in Fig.3.11.It is clear from the graphs
the performance of CFLANN is better than that of FLANN and LMS based systems.
Example 4: The system is described by
The nonlinearity used is .All other conditions are same as example 1.
The mean square error (MSE) and output responses are compared in Fig.3.12.
1 2( ) 0.26 0.93 0.26H z z z− −= + +
2 3( ) ( ) 0.2 ( ) 0.1 ( ) 0.5cos( ( ))b n a n a n a n a nπ= + − +
1 2( ) 0.26 0.93 0.26H z z z− −= + +
3( ) ( ) 0.9 ( )b n a n a n= −
40
0 1000 2000 3000 4000 5000-35
-30
-25
-20
-15
-10
-5
0M
SE
in d
BMSE plot
LMS
FLANN
CFLANN
-0.4
-0.2
0
0.2
0.4
0.6
0.8
no. of iterations (a) (b)
parison between LMS,FLANN and CFLANN (Example 3.)
(a) MSE plot (b) Response plot
Fig.3.11. Performance com
0 1000 2000 3000 4000 5000-35
-30
-25
-20
-15
-10
-5
0
no. of iterations
MS
E in
dB
MSE plot
LMS
FLANN
CFLANN
0 10 20 30-0.5
0
0.5
no. of samples
repo
nses
actuallmsflanncflann
ance comparison between LMS,FLANN and CFLANN (Example 4.)
) MSE plot (b) Response plot
(a) (b) Fig.3.12. Perform
(a
0 10 20 30-0.6
repo
nses
actuallmsflanncflann
no. of samples
41
42
.7. SUMMARY
Different types of ANN structures and thei
chapter. For identification purpose three ANN st
used. In FLANN, the input pattern is expanded
the input vector. The functional expansion ma
processing of signals in the hidden layer of an
dimensionality of the input pattern and thus, cre
multidimensional space and identification of c
with this network. From Fig.3.6 to Fig.3.10 it
FLANN structure is better than that of MLP.Since,
the computational complexity is less and thus, th arning is faster in comparison to an MLP.
. From the response and MSE plot it can be observed that for
near system the performance of CFLANN model is considerably better than that
f FLANN and LMS based models. The CFLANN model exhibits two advantages. The first
ne is that it involves less number of expansions. Secondly it provides better learning and
response matching performance for nonlinear system in identification.
3
r learning algorithms are discussed in this
ructures (MLP, FLANN and CFLANN) are
using trigonometric or polynomials terms of
y be thought of analogous to the nonlinear
MLP. This functional expansion increases the
ation of nonlinear decision boundaries in the
omplex nonlinear functions become simple
is evident that system identification with the
the hidden layer is absent in this structure,
e le
Therefore, this structure may be implemented for on-line applications.
From Fig. 3.11 and 3.12, it is observed that the CFLANN model is better than that of
FLANN and LMS based models
highly nonli
o
o
42
4. PRUNING USING GA
4.1. INTRODUCTION
System identification is a pre-requisite to analysis of a dynamic system and design of an
appropriate controller for improving its performance. The more accurate the mathematical
model identified for a system, the more effective will be the controller designed for it. In
many identification processes, however, the obtainable model using available techniques is
generally crude and approximate.
In conventional identification methods, a model structure is selected and the parameters of
that model are calculated by optimizing an objective function. The methods typically used for
optimization of the objective function are based on gradient descent techniques. On-line
Gradient-descent training algorithms are the most common form of training algorithms in
gnal processing today because they have a solid mathematical foundation and have been
roven over the last five decades to work in many environments. Gradient-descent training,
owever leads to suboptimal performance under nonlinear conditions. Genetic Algorithm
A)[4.1] has been widely used in many applications to produce a global optimal solution.
his approach is a probabilistically guided optimization process which simulates the genetic
volution. The algorithm cannot be trapped in local minima as it employs a random mutation
rocedure. In contrast to classical optimization algorithm, genetic algorithms are not guided
their search process by local derivatives. Through coding the variables population with
ronger fitness are identified and maintained while population with weaker fitness are
moved. This process ensures that better offsprings are produced from their parents. This
arch process is stable and robust and can identify global optimal parameters of a system.
he underlying principles of GA’s were first published by Holland in 1962[4.2].GA’ has
een used in many diverse areas such as function optimization [4.3],image processing [4.4],
the traveling salesman problem [4.5] ,[4.6] and system identification [4.7]-[4.11].
system identification used to date are based on recursive implementation of off-line methods
such as least squares, maximum likely-hood or instrumental variable. Those recursive
schemes are in essence local search techniques. They go from one point in the search point to
another at every sampling instant, as a new input-output pair becomes available. This process
usually requires a large set of input/output data from the system which is not always
available. In addition the obtained parameters may be locally optimal.
si
p
h
(G
T
e
p
in
st
re
se
T
b
43
In this thesis GA is used for simultaneously pruning and weight updation.While
constructing an artificial n faced with the problem of
choosing a network of the right size for the task to be carried out. The advantage of using a
less costly and faster in operation. However, a much reduced
hich do not contribute to estimate output. Jearanaitanakij
ptimization methods is that GA uses a population of points at one time in contrast to the
eural network the designer is often
reduced neural network is
network cannot solve the required problem while a fully ANN may lead to accurate solution.
Choosing an appropriate ANN architecture of a learning task is then an important issue in
training neural networks. Giles and Omlin [4.12] have applied the pruning strategy for
recurrent networks. Markel has employed [4.13] the pruning technique to FFT algorithm. He
has eliminated those operations w
and Pinngern [4.14] have analyzed on the minimum number of hidden units that is required to
recognize English capital letters of the ANN. Thus to achieve the cost and speed advantage,
appropriate pruning of ANN structure is required. In this chapter we have considered an
adequately expanded FLANN model for the identification of nonlinear plant and then used
Genetic Algorithm (GA) to train the filter weights as well to obtain the pruned input paths
based on their contributions. Procedure for simultaneous pruning and training of weights
have been carried out in subsequent sections to obtain a low complexity reduced structure.
4.2. GENETIC ALGORITHM
In the case of deterministic search, algorithm methods such as steepest gradient methods
are employed (using gradient concept), where as in stochastic approach, random variables are
introduced. Whether the search is deterministic or stochastic, it is possible to improve the
reliability of the results. GA’s are stochastic search mechanisms that utilize a Darwinian
criterion of population evolution. The GA has robustness that allows its structural
functionality to be applied to many different search problems [4.17]. This effectively
means that once the search variables are encoded into a suitable format, the GA scheme can
be applied in many environments. The process of natural selection, described by Darwin,
is used to raise the effectiveness of a group of possible solutions to meet an
environmental optimum [4.16].
Genetic algorithms are very different from most of the traditional optimization methods.
Genetic algorithms need design space to be converted into genetic space. So genetic
algorithms work with coding variables. The advantages of working with a coding variable
space is that coding discretizes the search space even though the function may be continuous.
A more striking difference between genetic algorithms and most of the traditional
o
44
single point approach by traditional optimization methods. This means that GA processes a
number of designs at the same time.
4.2.1. GA Operations
The GA operates on the basis that a population of possible solutions, called
chromosomes, is used to assess the cost surface of the problem. The GA evolutionary
process can be thought of as solution breeding in that it creates a new generation of
solutions by crossing two chromosomes. The solution variables or genes that provide a
positive contribution to the population will multiply and be passed through each
subsequent generation until an optimal combination is obtained.
The population is updated after each learning cycle through three evolutionary
processes: selection, crossover and mutation. These create the new generation of solution
variables. From the population a pool of individuals is randomly selected, some of these
survive into the next iterations population. A mating pool is randomly created and each
individual is paired off. These pairs undergo evolutionary operators to produce two new
individuals that are added to the new population.
The selection function creates a mating pool of parent solution strings based upon the
rom the mating pool the crossover operator exchanges “survival of the fittest” criterion. F
gene information. This essentially crosses the more productive genes from within the
solution population to create an improved, more productive, generation. Mutation randomly
alters selected genes, which helps prevent premature convergence by pulling the population
into unexplored areas of the solution surface and add new gene information into the
population.
Fig.4.1. A GA Iteration Cycle
45
4.2.2. Population Variable
A chromosome consists of the problem variables, where these can be arranged in a vector or a
ssover process, corresponding genes are crossed so that there is no
ber is used in the genetic
(4.1)
osome selection.
election process is used to weed out the weaker chromosomes from the population so
at the more
hromosome finesses are used to rank the population with each individual assigned it own
f
matrix. In the gene cro
inter-variable crossing and therefore each chromosome uses the same fixed structure. An
initial population that contains a diverse gene pool offers a better picture of the cost surface
where each chromosome within the population is initialized independently by the same
random process.
In the case of binary-genes each bit is generated randomly and the resulting bit-words
are decoded into their real value equivalent .The binary num
search process and the real value is used in the problem evaluation. This type of
initialization results in a normally distributed population of variables across a specific range.
This type of results in a normally distributed population of variables across a specific range.
A GA population, P, consists of a set of N chromosomes {Cj... CN} and N fitness values
{f1……fN}, where the fitness is some function of the error matrix.
1 1 2 2 3 3{( , ) ( , ) ( , ) ( , )}N NP C f C f C f C f=
The GA is an iterative update algorithm and each chromosome requires its fitness to be
evaluated individually. Therefore, N separate solutions need to be assessed upon the same
training set in each training iteration. This is a large evaluation overhead where population
sizes can range between twenty and a hundred, but the GA is seen to have learning rates that
evens this overhead out over the training convergence.
4.2.3. Chrom
The s
th productive genes may be used in the production of the next generation. The
c
fitness value,
2
1
1( ) ( )M
i jj
E in e nM =
= ∑ (4.2)
he solution cost value Et of the f chromosome in the population is calculated from a training-
lock of M training signals and from this cost an associated fitness is assigned:
T
b
1( )(1 ( ))i
i
f nE n
=+
(4.3)
46
The fitness can be considered to be the inverse
easons, i.e. E n
of the cost but the fitness function in Equation
(4.3) is preferred for stability r ( ) 0=i
osomes are selected randomly for the two pools but biased
of the finesses
are used to normalize these values, fmm=2.08.
4.2 shows a roulette wheel that has been
When the fitness of each chromosome in the population has been evaluated, two pools are
generated, a survival pool and a mating pool. The chromosomes from the mating pool
will be used to create a new set of chromosomes through the evolutional processes of
natural selection and the survival pool allows a number of chromosomes to pass onto the
next generation. The chrom
towards the fittest. Each chromosome may be chosen more than once and the fitter
chromosomes are more likely to be chosen so that they will have a greater influence in
the new generation of solutions.
The selection procedure can be described using a biased roulette wheel with the buckets of
the wheel sized according to the individual fitness relative to the population's total fitness
[4.17]. Consider an example population often chromosomes that have the fitness assessment
of f = {0.16, 0.16, 0.48, 0.08, 0.16, 0.24, 0.32, 0.08, 0.24, 0.16} and the sum
Figure split into ten segments and each segment is
in proportion to the population chromosomes relative fitness. The third segment therefore
fills nearly a quarter of the roulette wheels area. The random selector points to a chosen
chromosome, which is then copied into the mating pool because the third individual controls
a greater proportion of the wheel, it has a greater probability of being selected.
Chromosome segments Population roulette wheel
Fig.4.2. Biased roulette-wheel that is used in the selection of the mating pool
perator where
a string is selected from the mating pool with a probability proportional to the fitness. Thus
ith string in the population is selected with a probability proportional to fi where fi is the
fitness value of that string. Since the population size is usually kept fixed in a simple GA,
The commonly used reproduction operator is the proportionate reproductive o
47
the sum of probabilities of each string being selected for the mating pool must be one. The
probability of ith selected string is
1
ii n
jj
fpf
=
=
∑ (4.4)
Where ‘n’ is the population size.
Using the fitness value fi of all strings, the probability of selecting a string pi can be
calculated. There after, cumulative probability Pi of each string can be calculated by adding
the individual probabilities from the top of the list. Thus the bottom most string in the
population should have a cumulative probability of 1.The roulette wheel concept can be th
robability range(calculate from fitness value)
4.2.4. Gene Crossover
The crossover operator exchanges gene information between two selected chromosomes, (Cq,
Cr), where this operation aims to improve the diversity of the solution vectors. The pair of
chromosomes, taken from the mating pool, becomes the parents of two offspring
chromosomes for the new generation.
In the case of a binary crossover operation the least significant bits are exchanged between
correspond
along the bit sequence is chosen and then all of the bits right of the crossover point are
xchanged. In Fig.4.3 (a), which shows a single point crossover, the fifth crossover position
simulated by realizing that the i string in the population represents the cumulative
probability from Pi-1 to Pi.Thus the first string represents the cumulative values from 0 to P1.
Hence cumulative probability of any string lies between 0-1. In order to choose n strings, n
random numbers between zero and one is created at random. Thus the string that represents
the chosen random number in the cumulative p
for the string, is copied from to the mating pool. This way the string with a higher fitness
value will represent a larger range in the cumulative probability values and therefore, has a
higher probability of being copied into the mating pool. On the other hand string with a
smaller fitness value will represent a smaller range in the cumulative probability values and
therefore, has a lesser probability of being copied into the mating pool.
ing genes within the two parents. For each gene-crossover a random position
e
is randomly chosen, where the first position corresponds to the left side. The bits from the
right of the fourth bit will be exchanged. Fig.4.3 (b) shows a two point crossover in which
two points are randomly chosen and the bits in between them are exchanged. Fig.4.3. shows a
basic genetic crossover with the same crossover point chosen for both offspring genes. At the
48
start of the learning process the extent of crossing over the whole population can be decided
allowing the evolutionary process to randomly select the individual genes. The probability of
within each parent. P(crossing)≤1 allows all the gene values to be crossed and P(crossing)=0
leaves the parents unchanged, where a random gene selection value, ω Є {1,0}, is governed
ited to this simple operation. The crossover operator
for mutation with a probability that some of its genes will be mutated after the crossover
a gene crossing, P(crossing), provides a percentage estimate of the genes that will be affected
by this probability of crossing.
Before crossover
After crossover
1 0 1 0 0 1 0 1
0 0 1 0 1 1 1 0
1 0 1 0 1 1 1 0
0 0 1 0 0 1 0 1
(a)
Before crossover
After crossover
1 0 1 0 0 1 0 1
0 0 1 0 1 1 1 0
1 0 1 0 1 1 0 1
0 0 1 0 0 1 1 0
(b) Fig.4.3. Gene crossover (a) Single point crossover (b) Double point crossover .
The crossover does not have to be lim
can be applied to each chromosome independently, taking different random crossing points in
each gene. This operation would be more like grafting parts of the original genes onto each
other to create the new gene pair. All of a chromosome's genes are not altered within a single
crossover. A probability of gene-crossover is used to randomly select a percentage of the
genes and those genes that are not crossed remain the same as one of the parents.
4.2.5 Chromosome Mutation
The last operator within the breeding process is mutation. Each chromosome is considered
49
operation. A random number is generated for each gene, if this value is within the specified
mutation selection probability, P(mutation), the gene will be mutated. The probability of
mutation occurring tends to be low with around one percent of the population genes being
affected in a single generation. In the case of a binary mutation operator, the state of the
randomly selected gene-bits is changed, from zero to one or vice-versa.
Fig.4.4 Mutation operation in GA A simple genetic algorithm treats the mutation as a secondary operator with the role of
storing lost genetic materials. Suppose, for example, all the string in a population have
onveyed to to a zero at osition,
en crossover cannot regenerate a one at that position while a mutation can. It helps the
arch algorithm to escape from local minima’s traps since the modification is not related to
ny previous genetic structure of the population. The mutation is also used to maintain
iversity in the population .For example, consider the following population having four
ight-bit strings.
0110 1011
0011 1101
0001 0110
result these
arameters should be chosen properly.
obability.
re
c a given position and the optimal solution has a one at that p
th
se
a
d
e
0111 1100
All the four strings have a zero in the left most bit position. If the true optimum solution
requires a one in that position, then neither reproduction nor crossover operator will be able
to create a one in that position. Only mutation operation can change that zero to one.
4.3. PARAMETERS OF GA.
There are some parameter values required for GA. To get the desired
p
(a) Crossover and Mutation Pr
There are two basic parameters of GA - crossover probability and mutation probability.
Selected bit for mutation 1 0 1 1 0 0 1 0 1 0 1 1 1 0 1 0
Before mutation
After mutation
50
Crossover probability: This probability controls the frequency at which the crossover occurs
for every chromosome in the search process. This is a number between (0,l) which is
determined according to the sensitivity of the variables of the search process. The crossover
probability is chosen small for systems with sensitive variables. If there is crossover,
ffspring are made from parts of both parent's chromosome. Crossover is made in hope that
ew chrom ore the new
hromosomes will be better. However, it is good to leave some part of old population
rvives to next generation.
Mutation probability: This parameter decides how often parts of chromosome will be
utated. If there is no muta ediately after crossover (or
so some other parameters in GA. One important parameter is population
population in one generation. If there
to perform crossover and only a small
f there are too many chromosomes, GA
is proposed. Such a choice has led to effective pruning
runing strategy is based on the idea of successive
om
the FLANN architecture. As a result, many branches (functional expansions) are pruned and
o
n osomes will contain good parts of old chromosomes and theref
c
su
m tion, offspring are generated imm
directly copied) without any change. If mutation is performed, one or more parts of a
chromosome are changed. If mutation probability is 100%, whole chromosome is changed, if
it is 0%, nothing is changed. Mutation generally prevents the GA from falling into local
extremes. Mutation should not occur very often, because then GA will in fact change to
random search.
(b) Other Parameters.
There are al
size.
Population size: How many chromosomes are in
are too few chromosomes, GA has few possibilities
part of search space is explored. On the other hand, i
slows down. Research shows that after some limit (which depends mainly on encoding and
the problem) it is not useful to use very large populations because it does not solve the
problem faster than moderate sized populations.
4.4. PRUNING USING GA.
In this Section a new algorithm for simultaneous training and pruning of weights using
binary coded genetic algorithm (BGA)
of branch and updating of weights. The p
elimination of less productive paths (functional expansions) and elimination of weights fr
51
the overall architecture of the FLANN based model is reduced which in turn reduces the
e FLANN model. Again (T – 1) represents the order the filter
nd E represents the number of expansions specified for each input to the filter. Thus each
ining L bits.
nput training data:
s are generated to identify the static and feed forward dynamic plants.
n strength, there by producing k number of
esired signals. Thus the training data are produced to train the network.
corresponding computational cost associated with the proposed model without sacrificing the
performance. Various steps involved in this algorithm are dealt in this section.
Step 1- Initialization in GA:
A population of M chromosomes is selected in GA in which each chromosome constitutes
(T×E+1)×L number of random binary bits where the first L number of bits are called Pruning
bits (P) and the remaining bits represent the weights associated with various branches
(functional expansions) of th
a
chromosome can be schematically represented as shown in the Fig. 4.6.
A pruning bit (p) from the set P indicates the presence or absence of expansion branch
which ultimately signifies the usefulness of a feature extracted from the time series. In other
words a binary 1 will indicate that the corresponding branch contributes and thus establishes
a physical connection where as a 0-bit indicates that the effect of that path is insignificant and
hence can be neglected. The remaining (T.E.L) bits represent the (T.E) weight values of the
model each conta
Step 2- Generation of i
K (≥500) number of signal samples is generated. In the present case two different types of
signal
(i) To identify a feed forward dynamic plant, a zero mean signal which is uniformly
distributed between ±0.5 is generated.
(ii) To identify a static system, a uniformly distributed signal is generated within ±1.
Each of the input samples are passed through the unknown plant (static and feed forward
dynamic plant) and K such outputs are obtained. The plant output is then added with the
measurement noise (white uniform noise) of know
d
Step 3- Decoding:
Each chromosome in GA constitutes random binary bits. So these chromosomes need to be
converted to decimal values lying between some ranges to compute the fitness function. The
equation that converts the binary coded chromosome in to real numbers is given by
52
Plant
NL
z-1
z-1
.
.
.
.
.
.
w11
w12
w1E
p11
p1E
p12
.
.
. φ
φ11
1E
φ12
.
.
.
.
.
.
wt1
wt2
wTE
pt1
TE
pTE
pt2
.
.
. φ
φ
φt1
t2
.
.
.
.
.
.
w21
w22 p22
.
φ22
w2E
p21
p2E. . φ
φ21
2E. . . .
∑
∑ ∑
GA Based Algorithm
+
-
x(n)
noise Nonlinear Plant
++
e(n)
x(n-1)
x(n-T+1)
x(n)
b
d(n)
+1
FLANN model before pruning
y(n)
Fig.4.5. FLANN based identification model showing updating weights and pruning path
53
( )( )
max minmin 2 1L
R R
54
RV R DV⎧ ⎫⎛ ⎞
= + ×⎜ ⎟⎨ ⎬⎜ ⎟−⎪ ⎪⎝ ⎠⎩ ⎭ (4.5)
here Rmin, Rmax, RV and DV represent the minimum range, maximum range, decimal and
e representation. The first L number of bits is not
ce they represent pruning bits.
- To compute the estimated output:
t nth instant the estimated output of the neuron can be computed as
n W n P n b nφ= =
= × × +∑∑ (4.6)
here φij (n) represents jth expansion of the ith signal sample at the nth instant. Wijm (n) and
ijm (n) represent the jth expansion weight and jth pruning weight of the ith signal sample for
th chromosome at kth instant. Again bm(n) corresponds to the bias value fed to the neuron
th chromosome at nth instant.
- Calculation of cost function:
pared with corresponding estimated output and K errors are
he Mean-square-error (MSE) corresponding to mth chromosome is determined by
:
−⎪ ⎪
W
decoded value of an L bit coding schem
decoded sin
. . . L bits L bits L bits L bits
Pruning bits (P) V=T×E×L bits
V=T×E. nos. of weight bits (L)
Fig.4.6. Bit allocation scheme for pruning and weight updating
Step 4
A
1 1( ) ( ) ( ) ( ) ( )
T Em m m
ij ij iji j
y n
W
P
m
for m
Step 5
Each of the desired output is com
produced. T
using the relation2
1( )
Kk
k
eMSE m K=
= ∑ (4.7)
epeated for M times (i.e. for all the possible solutions). This is r
55
Step 6- Operations of GA:
Here the GA is used to minimize the MSE
ect the best M individuals which will be treated as parents in
ing procedure will be ceased when the MSE settles to a desirable level. At this
moment all the chromosomes attain the same genes. Then each gene in the chromosome
repres
.5. SIMULATION RESULTS
xtensive simulation studies are carried out with several examples from static as well as feed
rward dynamic systems. For updating the weights of the FLANN we will follow the
arning algorithm given in section 3.4. The performance of the proposed Pruned FLANN
N structure.
(A) Static Systems
Here different nonlinear static systems are
. The crossover, mutation and selection operators
are carried out sequentially to sel
the next generation.
Step 7- Stopping Criteria:
The train
ents an estimated weight.
4
E
fo
le
model is compared with that of basic FLAN
chosen to examine the approximation capabilities
of the basic FLANN and proposed Pruned FLANN models. In all the simulation studies
reported in this Section a single layer FLANN structure having one input node and one
neuron is considered. Each input pattern is expanded using trigonometric polynomials i.e. by
using cos( )n uπ and sin( )n uπ , for n = 0,1,2,…6. In addition a bias is also fed to the output. In
e simulation work the data used are K = 500, M = 40, N = 15, L = 30, probability of
utation = 0.1. Besides that the Rmax and Rmin values are
Example-1:
th
crossover = 0.7 and probability of m
judiciously chosen to attain satisfactory results. Three nonlinear static plants considered for
this study are as follows:
3 2
1( ) 0.3 0.4f u u u u= + − (4.8) Example-2: 2 ( ) 0.6sin( ) 0.3sin(3 ) 0.1sin(5 )f u u u uπ π π= + + (4.9)
Example-3:
3 2
3 5 4 3 2
4 1.2 1.2( )0.4 0.8 1.2 0.2 3
u uf uu u u u
− +=
+ − + − (4.10)
At any nth instant, the output of the ANN model y (n) and the output of the system d (n) is
compared to produce error e(n) which is then utilized to update the weights of the model. The
LMS algorithm is used to adapt the weights of basic FLANN model where as a proposed GA
ased algorithm is employed for simultaneous adaptation of weights and pruning of the
N model is trained for 30000 iterations where as the pruned
(a), (b), (c). The comparison of computational complexity
etween FLANN and pruned FLANN is given in Table.4.1.
TABLE. 4.1
help of several examples. In each example, one particular model of
e unknown system is considered. In this simulation a single layer FLANN structure having
one input node and one neuron is consid
input as well as the trigonometric polynomials i.e. by using
b
branches. The basic FLAN
FLANN model is trained for only 60 generations. Finally the weights of the ANN are stored
for testing purpose. The responses of both the networks are compared during testing
operation and shown in Figs.4.7
b
COMPARISON OF COMPUTATIONAL COMPLEXITIES BETWEEN A BASIC FLANN AND A PRUNED FLANN MODEL
Number of operations
Additions Multiplications Number of weights
Ex. No.
N Pruned FLANN
FLANN
Pruned FLANN
FLANN
Pruned FLANN FLAN
B. Dynamic Systems
In the following the simulation studies of nonlinear dynamic feed forward systems has
been carried out with the
Ex-1
14
3
14
3
15
4
Ex-2
14
2
14
3
15
3
Ex-3
14
5
14
5
15
6
th
ered. Each input pattern is expanded using the direct
, for n = ,sin( ) cos( )u n u and n uπ π
1. In this case the bias is removed. In the simulation work we have considered K = 500, M
= 40, N = 9, L = 20, probability of crossover = 0.7
Besides that the Rmax and Rmin values are judiciously chosen to attain satisfactory results. The
three nonlinear dynamic feed forward plants considered for this study are as follows:
and probability of mutation = 0.03.
56
Example-4:
(a) Parameter of the linear system of 410 , 0.8760 , 0.3410]
(
The basic FLANN m e proposed FLANN is
trained or only 60 genera white uniform noise of strength -30dB is
added to actual system response to assess the performance of two nder
noisy condition. T he AN sting. Finally the te he
networks mo dert pre ze wh signal to the
identified model. Performance comparison between the FLANN and pruned FLANN
structure in term ated output of
responses of both the networks are compared during testing operation and shown in Figs.4.7
(d), (e) The comparison f computational complexity between FLANN and pruned
LAN given able.4.2.
(a) Parameter of the linear system of the plant [ 0.2600 , 0.9300 , 0.2600 ]
(b) Nonlinearity associated with the plant yn(k) = yk + 0.2 yk2 – 0.1 yk
3,
Example-5:
(a) Parameter of the linear system of the plant [0.3040,0.9029,0.3040]
(b) Nonlinearity associated with the plant yn(k) = tanh(yk),
Example-6:
the plant [0.3
b) Nonlinearity associated with the plant yn(k) = yk – 0.9 yk3.
odel is trained for 2000 iterations where as th
f tions. While training, a
different models u
hen the weights of t
aken by
N are stored for te
ro mean
sting of t
del is un senting a ite random
s of estim the unknown plant has been carried out. The
, (f). o
F N is in T
0 10 20 30 40-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
no. of samples
outp
ut
desiredFLANNpruned FLANN
0 10 20 30 40-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8desiredFLANNpruned FLANN
no. of samples
outp
ut
(a) (b)
57
10 20 30 40-0.7
0.1 0.6
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0 5no. of samples
outp
ut
desiredFLANNpruned FLANN
10 15 20-0.4
-0.2
0
0.2
desiredFLANNpruned FLANN
0.4
outp
utno. of samples
(c) (d)
0 5 10 15 20-0.4
-0.2
0
0.2
0.4
0.6
no. of samples
outp
ut
desiredFLANNpruned FLANN
0 5 10 15 20
-0.4
-0.2
0
0.2
0.4
0.6desiredFLANNpruned FLANN
no. of samples
outp
ut
(e) (f)
Fig.4.7. Output plot for static and dynamic systems (a) output plot for Example 1. (b) Output
plot for Example 2. (c) output plot for Example 3. (d) output plot for Example 4.(e)output
plot for Example 5. (f) output plot for Example 6.
58
TABLE. 4.2 COMPARISON OF COMPUTATIONAL COMPLEXITIES BETWEEN A BASIC
FLANN AND A PRUNED FLANN MODEL
4.6. SUMMARY
Simultaneous weight updating and pruning of FLANN identification models using GA is
presented. The pruning strategy is based on idea of successive elimination of less productive
path. For each weight a separate pruning bit reserved in this process. Computer simulation
studies on static and dynamic nonlinear plants demonstrate that there is more than 50% active
paths are pruned keeping response matching almost identical with those obtaining from
conventional FLANN identification models.
Number of operations
Additions
Multiplications
Number of weights
Ex. No.
FLANN Pruned FLANN
FLANN
Pruned FLANN
FLANN
Pruned FLANN
Ex-1
8
3
9
4
9
4
Ex-2
8
2
9
3
9
3
Ex-3
8
2
9
3
9
3
59
5. CHANNEL EQUALIZATION
5.1. INTRODUCTIO
Recently, there has been substantial increase of demand for high speed digital data
transmis ion effectively v ion channel. C els are
usuall deled as ed linear fi onse (FIR) filters with low pass
frequency response. When litud e en elay e ar nstant
within e bandw the filter, the channel disto ts the transmitted signal causing
inters interf nce (ISI) ecause of this linear d tion, the mitted symbols are
spread and overlapped over successive tim tervals. In addition to the linear di rtion, the
transmitted symbols are subjec to other im ments such as therm pulse noise,
nd n ear di ortion aris g from the modulation/demodulation process, cross-talk
terference, the use of amplifiers and converters, and the nature of the channel itself. All the
gnal processing methods used at the receiver's end to compensate the introduced channel
ver the transmitted symbols are referred as channel equalization
h going on to introduce new algorithms to train
lizer. daptive channel equalization was first proposed and analyzed by Lucky in 1965[5.11].
daptive channel equalizer employing a multilayer perceptron (MLP) structure has been
of the MLP structure is the long training
e required for generalization and thus, this network has very poor convergence speed
hich is primarily due to its multilayer architecture. A single layer polynomial perceptron
N) has been utilized for the purpose of channel equalization [5.3] in which the
xpanded using polynomials and cross-product terms of the pattern
r the equalization problem. Superior performance
ork over a linear equalizer has been reported. An alternative ANN structure called
LANN) originally proposed by Pao [5.8] is a novel single layer ANN
ing arbitrarily complex decision regions. In this network, the initial
N
s o er physical communicat ommunication chann
y mo band-limit nite impulse resp
the amp e a d thn velope d re onssp e not co
th idth of r
ymbol ere . B istor trans
e in sto
t pair al noise, im
a onlin st in
in
si
distortion and reco
techniques. High speed communications channels are often impaired by channel inter symbol
interference (ISI) and additive noise. Adaptive equalizers are required in these
communication systems to obtain reliable data transmission. In adaptive equalizers the main
constraint is training the equalizer. Many algorithms have been applied to train the equalizer,
each having their own advantages and disadvantages. More over the importance of the
channel equalization always keeps the researc
the equa
A
A
reported [5.2], [5.7]. One of the major drawback
tim
w
network (PP
original input pattern is e
and then, this expanded pattern is utilized fo
of this netw
functional link ANN (F
capable of form
60
representation of a patt ns resulting in higher
imensional pat tern and hence, the separability of the patterns becomes possible. The PPN,
for the expansion of the input pattern, in fact, is a subset of the
y
.
t
ern is enhanced by the use of nonlinear functio
d
which uses the polynomials
broader FLANN famil . Applications of the FLANN have been reported for functional
approximation [5.8] and for channel equalization [5.l], [5.4]. It has been shown [5.9] that in
the case of 2-ary PAM signal, BER and MSE performance of the FLANN-based equalizer is
superior than two other ANN structures such as MLP and PPN.
5.2. BASEBAND COMMUNICATION SYSTEM
In an ideal communication channel, the received information is identical to that transmitted.
However, this is not the case for real communication channels, where signal distortions take
place. A channel can interfere with the transmitted data through three types of distorting
effects: power degradation and fades, multi-path time dispersions and background thermal
noise. Equalization is the process of recovering the data sequence from the corrupted channel
samples. A typical base band transmission system is depicted in Fig.5.1., where an equalizer
is incorporated within the receiver.
OutpuTransmitter Filter
Equalizer Receiver Filter
Channel Medium +
Noise
put
The equalization approaches investigated in this thesis are applied to a BPSK (binary phase
shift keying) base band communication system. Each of the transmitted data belongs to a
binary and 180° out of phase alphabet {-1, +1}.
5.3. CHANNEL INTERFERENCE
In a communication system data signals can either be transmitted sequentially or in parallel
across a channel medium in a manner that can be recovered at the receiver. To increase the
data rate within a fixed bandwidth, data compression in space and/or time is required
In
Fig.5.1. A baseband Communication System
61
5.3.1. Multipath Propagation.
Within telecommunication channels multiple paths of propagation commonly occur. In
practical terms this is equivalent to transmitting the same signal through a number of separate
channels, each having a different attenuation and delay [5.13]. Consider an open-air radio
transmission channel that has three propagation paths, as illustrated in Fig.5.2 [14].These
could be direct, earth bound and sky bound.
Fig.5.2 (b) describes how a receiver picks up the transmitted data. The direct signal is
received first whilst the earth and sky bound are delayed. All three of the signals are
attenuated with the sky path suffering the most.
Multipath interference between consecutively transmitted signals will take place if one
signal is received whilst the previous signal is still being detected [5.13]. In Fig.5.2. this
would occur if the symbol transmission rate is greater than1/τ . Because bandwidth
efficiency leads to high data rates, multi-path interference commonly occurs.
Multiple Transmission Paths
Sky Bound Transmitter
Receiver
Direct
Earth Bound
(a) Signal Strength
at Receiver Earth Bound
Direct Sky Bound
τ
Receiver Samples
(b)
Fig.5.2. Impulse Response of a transmitted signal in a channel which has 3 modes of
propagation, (a) The signal transmitted paths, (b) The received samples.
Channel models are used to describe the channel distorting effects and are given as a
62
summation of weighted time delayed channel inputs ( )d n i− .
phase channel is convergent, illustrated by
1 2
0( ) ( ) ( ) ( 1) ( 2) ......
mi
i
H z d n i z d n d n z d n z− − −
=
= − = + − + − +∑ (5.1)
The transfer function of a multi-path channel is given in Equation 5.1. The model coefficients
( )d n i− describe the strength of each multipath signal.
5.4. MINIMUM AND NONMINIMUM PHASE CHANNELS
When all the roots of the model z-transform lie within the unit circle, the channel is termed
minimum phase [5.15] The inverse of a minimum
Equation(5.2): 1
1
( ) 1.0 0.51 1( ) 1.0 0.5
H z z
H z z
−
−
= +
=+
0
12
ii
iz
∞−
=
⎛ ⎞= −⎜ ⎟⎝ ⎠
∑(5.2)
nels are not convergent, as shown in
Equation (5.3)
1 2 31 0.5 0.25 0.125z z z− − −= − + − +
ase chanwhere as the inverse of non-minimum ph
1
0 2i=
2 3.5 0.25 0 5z z z ⎤+ −
( ) 0.5 1.01( ) 1.0 0.5
1.
. 1 0 .12
ii
H z zz
H z z
z z
z
−
∞−
= +
=+
⎡ ⎤⎛ ⎞= −⎢ ⎥⎜ ⎟⎝ ⎠
⎡= −⎣ ⎦
∑ (5.3)
Since equalizers are de inve e c annel istortio
model the channel inverse. The minimum phase hannel has a linear inverse model therefore
a linear equalization solution exists. However, limiting the inverse model to m-dimensions
will that non-linear solutions can provide a
perior inverse model in the same dimension.
phase model, where
nger delays are necessary to provide a reasonable equalizer. Equation (5.4) describes a non-
e. The
latter of these is the more suitable form for a linear filter.
⎢ ⎥⎣ ⎦
signed to rt th h d n process they will in effect
c
approximate the solution and it has been shown
su
A linear inverse of a non-minimum phase channel does not exist without incorporating
time delays. A time delay creates a convergent series for a non-minimum
lo
minimum phase channel with a single delay inverse and a four sample delay invers
63
1( ) 0.5 1.0H z z−= +
1 2 31 1 1 0.5 0.25 0.125 (( ) 1 0.5
z z z z non caH z z
− = = − + − ++
.5. INTERSYMBOL INTERFERENCE
overlapping of the
imensionality of the channel output vector helps characterize the multipath
propagation. This has the effect of not only increasing the number of symbols but also
increases the Euclidean distance between the output classes.
When additive Gaussian noise
)usal 5.4)
4 3 2 11 0.5 0.25 0.125 ( )( )
z z z z z truncated and causalH z
− − − −= − + − +
5
Inter-symbol interference (ISI) has already been described as the
transmitted data. It is difficult to recover the original data from one channel sample
dimension because there is no statistical information about the multipath propagation.
Increasing the d
, η , is pre
Gaussian clusters around the symbol centers. These symbol clusters can be characterized by a
probability density function (pdf) with a noise variance
sent within the channel, the input sample will form
2ησ , where the noise can cause the
f the input samples. Error control coding schemes can be employed in such
cases but these often require extra bandwidth.
5.5.1. Symbol Overlap.
The expected number of errors can be calculated by considering the amount of symbol
interaction, assuming Gaussian noise. Taking any two neighboring symbols, the cumulative
distribution function (CDF) can be used to describe the overlap between the two noise
symbol clusters to interfere. Once this occurs, equalization filtering will become inadequate
to classify all o
characteristics. The overlap is directly related to the probability of error between the two
symbols and if these two symbols belong to opposing classes, a class error will occur.
Figure 2.3 shows two Gaussian functions that could represent two symbol noise
distributions. The Euclidean distance, L, between symbol canters and the noise variance, 2σ ,
can be used in the cumulative distribution function of Equation (5.5) to calculate the area of
overlap between the two symbol noise distributions and therefore the probability of error, as
in Equation (5.6) 2
2
1( ) exp22
x xCDF x dxσπσ−∞
⎡ ⎤= −⎢ ⎥
⎣ ⎦ ∫ (5.5)
64
Area of overlap = Probability of error
Fig.5.3. Interaction between two neighboring symbols.
( ) 22LP e CDF ⎛ ⎞= ⎜ ⎟
⎝ ⎠ (5.6)
Since each channel symbol is equally likely to occur, the probability of unrecoverable errors
occurring in the equalization space can be calculated using the sum of all the CDF overlap
between each opposing class symbol. The probability of error is more commonly described
as the BER. Equation(5.7)describes the BER based upon the Gaussian noise overlap, where
spN is the number of symbols in the positive class, mN is the number of number of symbols
in the negative class and iΔ , is the distance between the thi positive symbol and its closest
neighboring symbol in the negative class.
1
2( ) log2
in
isp m n
BER CDFN N
σσ=
= ⎢ ⎥spN ⎤⎛ ⎞Δ⎡
⎜ ⎟+⎢ ⎥⎝ ⎠
ymbol
ai
eeps the research going on to introduce new algorithms to train
the equalizer. The optimal BER equalization
sequence estimator (MLSE) on the entire transmitted data sequence [5.18].A more practical
⎣ ⎦∑ (5.7)
5.6. CHANNEL EQUALIZATION
High speed communications channels are often impaired by channel inter s
interference (ISI) and additive noise. Adaptive equalizers are required in these
communication systems to obtain reliable data transmission. In adaptive equalizers the m n
constraint is training the equalizer. Many algorithms have been applied to train the equalizer,
each having their own advantages and disadvantages. More over the importance of the
channel equalization always k
performance is obtained using a maximum likelihood
65
MLSE would operate on smaller data sequences but these can still be computationally
xpensive, they also have problems tracking time-varying channels and can only produce
quences of outputs with a significant time delay. Another equalization approach
plements a symbol-by-symbol detection procedure and is based upon adaptive filters. The
mbol-by-symbol approach to equalization applies the channel decision
lassifier that separates the symbol into their respective classes. Two types of symbol-by-
mbol equalizers are examined in this thesis, the transvers decision feedback
q linear filters, LTE
nd LDFE, with a simple FIR structure. The ideal equalizer will model the inverse of the
hannel model but this does not take into account the effect of noise within the channel.
e
se
im
sy output samples to a
c
sy al (TE) and
e ualizer (DFE). Traditionally these equalizers have been designed using
a
c
Noise
A basic block diagram of channel equalization is shown in Fig. 5.4.The transmitted signal
X(n) pass through the channel .The block N.L accounts for the nonlinearity associated with
the channel. q(n) is the Gaussian noise added through the channel. The equalizer is placed at
the receiver end. The output of the equalizer is compared with the delayed version of the
transmitted signal to calculate the error signal e(n),which is used by the update algorithm to
update the equalization coefficient such that the error becomes minimum.
∑
Channel
Update Algorithm
+ N.L.
+
y (n)
d(n)
_
X(n)
e(n)
Equalizer
a(n) b(n)
r(n)
q(n)
Delay
N.L. =Nonlinearity
Fig.5.4. Block diagram of Channel Equalization
66
5.6.1. Transversal Filter
The transversal equalizer uses a time-delay vector, ( )Y n (Equation (5.8)), of channel
output samples to determine the symbol class. The {m} TE notation used to represent the
transversal equalizer specifies m inputs.
( ) [ ( ), ( 1), .... , ( ( 1))]Y n y n y n y n m= − − − (5.8)
The equalizer filter output will be classified through a threshold activation device (Fig. 5.5)
so that the equalizer decision will belong to one of the BPSK states i.e. 1 or -1.
z-1 z-1 z-1 z-1y(n) y(n-2) y(n-3) y(n-1) y(n-4) w2(n) w3(n) w4(n) w1(n) w5(n) Adder
d(n)
Fig.5.5. Linear Transversal Filter
y(n)= Received samples
d(n)=Equalized output
1( ) 1.0 0.5H z z−= +Considering the inverse of the channel that was given in Equation (5.3),
is is an infinitely long convergent linear series: 0
1 1( ) 2
ii
i
zH z
∞−
=
⎛ ⎞= −⎜ ⎟⎝ ⎠
∑th . Each coefficient of
this inverse model can be used in a linear equalizer as a FIR tap-weight. Each tap-dimension
will improve the accuracy; however, high input dimensions leave the equalizer susceptible to
noisy samples. If a noisy sample is received, this will remain within the filter affecting the
output from each equalizer tap. Rather than designing a linear equalizer, a non-linear filter
can be used to provide the desired performance that has a shorter input dimension; this will
reduce the sensitivity to noise.
67
5.7. SIMULATION RESULTS
BP algorithm and a linear FIR equalizer with LMS
algorithm as discussed in section 2.6.. Th
e simulations
e assign the same input signals to the two neural-network based equalizer structures
onsidered. To the channel output a zero mean white Gaussian noise of SNR 30dB was
dded. Th ei s is to s the SNR equal to the
c ocal of noise varian t of the equ
Fo NN th zer h branch each s expanded to five branches
sing trigonometric functions. The output of the FLANN contains a tanh(.) function. Total
umber of weights used for FLANN is 31 (including one bias term).In case of CFLANN in
e first stage each branch is expanded into five branches. The output of first s gain
xpanded into three terms. The number of weights used in the first stage is 19 (including one
ias term). The number of weights used in the second stage is 4 (including one bias term).
otal number of weights used for FLANN is 23. At each stage CFLANN contains tanh(.)
nction. The l as 0.03 for both the structure. To study the BER
erformance, each of the equalizer structures was trained with 5000 iterations for optimal
eight solution. After comple qualizer was carried out. The
ER was calculated 105 data samples.
The channel considered here has the normalized transfer function given in z-transform
the nonlinearity is plotted in Fig.5.6.From the BER plot it is
bserved that the performance of CFLANN equalizer is better than those of FLANN and
MS based equalizers.
Extensive simulation studies have been carried out for channel equalization problem as
described in Fig 5.4. using the two discussed ANN structures (FLANN and CFLANN), as
discussed in section 3.4. and 3.5., with
e digital message was with binary phase shift
keying (BPSK) signal constellation and in the form [-1 1] in which each symbol was
obtained from a random distribution. To obtain meaningful comparisons from th
w
c
a e rec ved ignal power normalized unity so a to make
re ipr ce at the inpu alizer.
r FLA e equali ave six es and branch i
u
n
th tage is a
e
b
T
fu earning rate is chosen
p
w tion of the training, testing of the e
B
form: 1 2( ) 0.26 0.93 0.26H z z z− −= + + (5.9)
The following types of nonlinearity were introduced in the channel.
(1) 2 3( ) ( ) 0.2 ( ) 0.1 ( )b n a n a n a n= + − (5.10)
(2) 2 3( ) ( ) 0.2 ( ) 0.1 ( ) 0.5cos( ( ))b n a n a n a n a nπ= + − + (5.11)
The BER performance for both
o
L
68
10 12 14 16 18 20 2210-5
10-4
10-3
10-2
10-1BER plot
lmsflanncflann
BE
R---
--->
SNR in dB----->
(a)
10 11 12 13 14 15 16 17 18 19 2010-5
10-4
10-3
10-2
10-1BER plot
lmsflanncflann
BE
R---
-->
SNR in dB----->
plot (a) For nonlinearity(5.10) (b) For nonlinearity(5.11)
(b)
Fig.5.6. BER
69
5.8. SUMMARY
To compensate the effect of ISI and other type noises on the bits, when transmitted through
the channel, an equalizer is placed at the receiver end. For the equalization of highly
nonlinear systems the number of branches in the FLANN increases. Even some cases give
poor performance. To decrease the number of branches and increase the performance a two-
stage FLANN is described in this chapter. Here the output of the first stage again undergoes
functional expansion. From the BER plots it can be observed that for nonlinear channels the
performance of CFLANN equalizer is considerably better than those of FLANN and LMS
based equalizers.
70
6. CONCLUSIONS
ctures, MLP, FLANN and CFLANN, are discussed. Due to multilayer
ructure of the MLP, its convergence is slow. On the otherhand FLANN is a single layer
ructure with functionally mapped inputs. Here, the initial representation of a pattern is
nhanced by using nonlinear function and thus the pattern dimension space is increased. The
nctional link acts on an element of a pattern or entire pattern itself by generating a set of
nearly independent function and then evaluates these functions with the pattern as the
rgument. Hence separation of the patterns becomes possible in the enhanced space. While
onstructing an artificial neural network the designer is often faced with the problem of
hoosing a network of the right size for the task to be carried out. To overcome this problem
GA based pruning strategy and weight updation algorithm is used.
Transmission bandwidth is one of the precious resources in digital communication
stems. To achieve better use of this resource, signals are commonly transmitted through
and-limited channels. So the received signals inevitably affected by inter symbol
terference(ISI). A channel equalizer is used to recover the transmitted data from the
ceived signals. If the nonlinearity associated with the system or channel is high the number
f branches in the FLANN increases. Even some cases give poor performance. To decrease
e number of branches and increase the performance CFLANN is used.
.2. SCOPE FOR FUTURE WORK
In this thesis CFLANN is used for only FIR system and channels. This can be extended to
finite impulse response (IIR) systems and channels.
In pruning technique GA is used which is a very slow process and has a very high
omputational complexity. Hence research can be done to find out a faster method for
runing.
6.1. CONCLUSIONS
The aim of this thesis is to find a proper artificial neural network (ANN) model for adaptive
nonlinear system identification and channel equalization. The prime advantages of using
ANN models are their ability to learn based on optimization of an appropriate error function
and their excellent performance for approximation of nonlinear functions. .
Three ANN stru
st
st
e
fu
li
a
c
c
a
sy
b
in
re
o
th
6
in
c
p
71
REFERENCES
Chapter. 1 [1.1] Narendra K.S. and Annaswamy A.M., Stable Adaptive Systems. Englewood Cliffs,
NJ: Prentice - Hall, 1989.
[1.2] Proakis J. G.,
Digital Communications, third edition, McGraw Hill, 1995.
[1.3] Chen S.,Billings, S. A. and Grant, P. M.,“Nonlinear system identification using
neural networks”, Int. J. Contr., vol. 51, no. 6, June 1990, pp. 1191-1214.
[1.4] Narendra K. S. and Parthasarathy K., “Identification and control of dynamic systems
using neural networks”,
IEEE Trans. on Neural Networks, vol. 1, Mar, 1990, pp. 4-27.
[1.5] Widrow B. and. Stearns S. D., Adaptive Signal Processing, Second Edition, Pearson
Education, 2002.
[1.6] Qureshi S., “Adaptive equalization”, IEEE Communications Magazine, Mar 1982, pp
9-16.
[1.7] Siller C., “Multipath propagation”, IEEE communications magazine, vol.22, no.2,
Feb.1984, pp.6-15.
[1.8] Chen S., Mulgrew B. and McLaughlin S., “Adaptive Bayesian Equalizer with
Decision Feedback”, IEEE Trans. on Signal Processing, vol.41, no.9, Sept. 1993,
pp.2918-2927.
[1.9] Patra J. C., Pal R. N., Chatterji B. N. and Panda G., “Identification of nonlinear
dynamic systems using functional link artificial neural networks”, IEEE Trans. On
Systems, Man and Cybernetics-part B: Cybernetics, vol. 29, no. 2, April 1999, pp.
254-262.
Chapter. 2 [2.1] Ljung L., System Identification. Englewood Cliffs, NJ: Prentice-Hall, 1987.
[2.2] Akaike H.,“A new look at the statistical model identification”. IEEE Trans. on
Automatic Control, AC-19:716-723, 1974.
[2.3] Widrow Bernard & D.Streans Samuel, Adaptive Signal Processing, Second Edition,
Pearson Education, 2002.
72
[2.4] Haykin Simon, Adaptive Filter Theory, Pearson Education Publisher, 2003.
[2.5] Wiener N., “Extrapolation, Interpolation, and Smoothing of Stationary Time Series
with Engineering Applications”, MIT Press, vol.15.pp.18-25, Cambridge, MA, 1949.
n N., “The Wiener RMS (root-mean-square) error criterion in filter design and [2.6] Levinso
prediction”, Math Phys., pp. 261-278, Vol.25, 1947.
Fan H. and Jenkins, W.K. "A[2.7] New Adaptive IIR Filter", IEEE Trans. on Circuits and
Systems, vol. CAS-33, no. 10, Oct. 1986 , pp. 939-947.
[2.8] Ho K.C. and Chan Y.T., "Bias Removal in Equation-Error Adaptive IIR Filters", IEEE
Trans. on Signal Processing, vol. 43, no. 1, Jan. 1995.
[2.9] Cousseau J.E. and Diniz P.S.R., "New Adaptive IIR Filtering Algorithms Based on the
Steiglitz-McBride Method", IEEE Trans Signal Processing, vol. 45, no. 5, May 1997.
[
regational Gradient Descent", Mathematics of Control, Signals, and
2.10] Blackmore K.L., Williamson R.C., Mareels I.M.Y and Sethares, W.A., "Online
Learning via Cong
Systems vol.10, pp. 331-363,1997.
Haykin[2.11] Simon, “Adaptive Filter Theory”, Third Edition, Prentice-Hall Inc., Upper
[2.12] Nambiar, R., “Learning Algorithms: Theory and Applications
in Signal Processing, Control, and Communications”, CRC Press
Saddle River, NJ, 1996.
Mars P., Chen J.R. and
, vol.10,pp.18-25Inc.,
Chapter. 3 [3.1] Haykin S.,“
1996.
NeuralNetworks: A ComprehensiveFoundation”, Pearson Education Asia,
S., Lewis F.L.,“Identification of a class of nonlinear dynamical systems
ultilayered neu
2002.
[3.2] Jagannathan
using m ral networks”, IEEE International Symposium on Intelligent
Control, Columbus, Ohio, USA, 1994), pp. 345–351.
a K. S., and Parthasarathy K., “Identification and control of dynamical system [3.3] Narendr
using neural networks,” IEEE Trans. on Neural Networks, vol. 1, no.1, Mar. 1990,
[3.4] P Neural Network
pp. 4-26.
ao Y. H., Adaptive Pattern Recognition and , Reading, MA, Addison
Wesley, 1989, Chapter 8, pp.197-222.
73
[3.5] Patra J.C., Pal R.N., Chatterji B.N., and Panda G., “Identification of Nonlinear
Dynamic Systems Using Functional Link Artificial Neural Networks”, IEEE Trans.
on Systems, Man and Cybernetics-Part B : Cybernetics Vol. 29, No.2, Apr.
[3.6] P Identification Using Chebyshev
1999,pp.254-262.
atra J.C., Kot A.C.,“Nonlinear Dynamic System
Functional Link Artificial Neural Networks”, IEEE Trans. on Systems Man and
Cybernetics-Part B. Cybernetics.Vol.32,No.4,Aug. 2002,pp.505-510.
D ans. on circuits and systems-I:
[3.7] Wang J.and Chen Y., “A Fully Automated Recurrent Neural Network for Unknown
ynamic System Identification and Control”, IEEE Tr
Regular papers, Vol.53, No.6, June 2006 pp.1363-1372..
artin T. Hagan, Howard B. Demuth., [3.8] M Neural Network Design, Thomson Learning
[3.9] D
2003.
ayhoff E.J.,“Neural Network Architecture – An Introduction” Van Norstand Reilold,
ew York, 1990. N
[3.10] Bose N.K., and Liang P., “Neural Network Fundamentals with Graphs, Algorithms,
Applications”, TMH Publishing Company Ltd, 1998.
idrow Bernard and D.Streans Samuel. [3.11] W Adaptive Signal Processing, Pearson
Education Publisher.
[3.12] Park G.H. and Sobjic D.J., “Learning and Generalization Characteristics of
dom Vector Function”, Neuro Computation
Pao Y.H.,
the Ran , vol.6, pp.163-180, 1994.
l Equalization”, Signal Processing 43(1995)
[3.13] Patra J.C., and Pal R.N., “A Functional Link Artificial Neural Network for Adaptive
Channe , vol.43, no.2, pp.181-195May
[3.14] VDT and Its
1995.
Mishra S. K. and Panda G., “A Novel Method for Designing L
Comparison with Conventional Design”, SAS 2006-IEEE Senosrs Applications
Symposium,pp.129-134, Houstan, Texas, USA, 7-9 Feb.2006
[4.1] nd Artificial Systems
Chapter. 4 Holland J.H., Adaptation in Natural a . University of Michigan
Press, Ann Arbor, 1975.
74
[4.2] Holland J. H., “Outline for a logical theory of adaptive systems,” J. ACM, vol. 3, pp.
297 - 314, July 1962; also in A. W. Burks, Ed., Essays on Cellular Automata, Univ.
Illinois Press, 1970, pp. 297-319.
DeJong K. A., “A[4.3] n analysis of the behavior of a class of genetic adaptive systems,”
Ph.D. dissertation (CCS), Univ. Mich., Ann Arbor, MI, 1975.
Fitzpatrick J. M., Grefenstette J. J., and Gucht D. Van, “Image registration by genetic
search,”
[4.4]
in Proc. IEEE Southeastcon ’84, 1984, pp.460-464.
[4.5] Goldherg D. E. and Lingle R., ‘‘Alleles, loci and the traveling salesman problem,” in
Proc. Int. Conf. Genetic Algorithms and Their Appl., pp. 154-159. 1985
Grefenstette J. J., Gopal R., Rosmaita B. and VanGucht[4.6] D., “Genetic algorithms for
the traveling salesman problem,” in Proc. Int. Conf Genetic Algorithms and Their
Applications, pp. 160-165, 1985.
rints from the Proc. 19th annual Pittsburgh Conf Modeling and
[4.7] Das R. and Goldberg D.E., “Discrete-time parameter estimation with genetic
algorithms,” prep
Simulation, 1988.
nf Acoustics, Speech, Signal
[4.8] Etter D. M., Hicks M. J. and. Cho K. H, “Recursive adaptive filter design using an
adaptive genetic algorithm,” Proc. IEEE Int. Co
Processing, vol. 2, pp. 635-638, 1982.
Goldberg D. E. System[4.9] identification via genetic algorithm, unpublished manuscript,
Univ. Mich., Ann Arbor, MI, 1981.
Kristinsson K. and Dumont G. A.,” Genetic algorithms in system identif[4.10] ication,”
Third IEEE Int. Symp. Intelligent Contr., Arlington, VA, pp. 597-602, 1988.
Smith T. and DeJong K.A., “Genetic Algorithms applied to the calibration of
inform
[4.11]
ation driven models of US migration patterns,” in Proc. 12th Annu. Pittsburgh
Conf Modeling and Simulation, 1981, pp. 955-959, 1981.
[4.12] Giles C. Lee and Christian W. Omlin, “Pruning Recurrent Neural Networks for
Improved Generalization Performance”, IEEE Trans. on Neural Networks Vol.5,
No.5, Sept.1994,pp.848-851.
[4.13] Markel J.D., “FFT pruning”, IEEE Trans. on Audio and Electro Acoustics, Vol.AU-
4, Dec.1971,pp.305-311.
[
l Letter Reduction”, IEEE conference
19, No.
4.14] Jearanaitanakij K.and Pinngern O., “Hidden Unit Reduction of Artificial Neural
Network on English Capita , pp.1-5. June 2006.
75
4.15] Chung T., Leung H., “A genetic algorithm approach in optimal capacitor selection with
harmonic distortion considerations”, International Journal of Electrical Power & Energy
Systems, vol.21, no.8, Nov. 1999, pp
[
.561-9.
[4.16] Fogel D., “What is evolutionary computing”, IEEE spectrum magazine, Feb. 2000,
pp.26-32.
[4.17] Goldberg D., Genetic algorithms in search optimization , Addison-Wesley, 1989.
C[5.1] transmission
hapter. 5 Arcens S., Sueiro, J. C. and Vidal, A. R. F. "Pao networks for data
equalization," Proc. Int. Jt. Conf. on Neural Networks, Baltimore, Maryland, pp.
11.963-11.968, June 1992.
Chen S., Gibson G.J., Cowan, C
[5.2] .F.N. and Grant P. M., “Adaptive equalization of
finite nonlinear channels using multilayer perceptrons,” EURASIP Signal Processing
Jnl., V01.20, 1990, pp.107-119.
Chen S., Gibson G[5.3] .J. and Cowan C.F.N., "Adaptive channel equalization using a
polynomial perceptron structure," Proc.IEE, Pt.I, Vo1.137, pp.257-264. October 1990.
Gan W. S., Saraghan J. J., and Durrani T. S., "New functional-link based equalizer", [5.4]
Electronics Letters, Vol.28, No. 17, 13 Aug. 1992, pp. 1643-1645.
[5.5] Haykin S., Adaptive Filter Theory, Prectice Hall, Englewood Cliffs, NJ. 1986,
Chapter 5, pp.194-268.
[5.6] Hornik K., Stichcombe M. and White H.,"Multilayer feed forward networks are
universal approximators", IEEE Trans. on Neural Networks, Vol. 2, 1989, pp. 359-
[5.7]
366.
Meyer M. and Pfeiffer G.,"Multilayer perceptron based decision feedback equalizers
for channels with intersymbol interference,"Proc. IEE, Pt- I ,Vol 140, No 6, pp 420-
[5.8]
424,June 1992.
Pao Y. H. Adaptive Pattern Recognition and Neural Networks, Reading, MA,
Addison Wesley, 1989, Chapter 8, pp. 197-222, Dec 1993.
P Signal Processing Jnl
[5.9] Patra J. C. and Pal R. N.,"A functional link artificial neural network for adaptive
channel equalization", EURASI ., Vol. 43, No. 2, 1995
[5.10] ,
madaline and back propagation," Proc. IEEE
.pp.662-670.
Widrow B. and Lehr, M. A.,"30 years of adaptive neural networks: perceptron
, vo1.78, No.9, September 1990,
pp.1415-1442.
76
[5.11] Lucky R. W., "Automatic equalization for digital communications", Bell Syst. Tech.
Journal, vol. no. 44, April 1965, pp. 547-588.
Proakis J. G., [5.12] Digital Communications, third edition, McGraw Hill, 1995.
[5.13] Qureshi S., “Adaptive equalization”, Proceedings of the IEEE, vol.73, no.9, pp.1349-
1387, Sept. 1985.
[5.14] Widrow B., Stearns S., Adaptive signal processing, Chap.9, Prentice-Hall Signal
processing series, New Jersey 1985.
i O., “Adaptive processing, the least mean squares approach with [5.15] Macch
applications in transmission”, John Wiley and Sons, West Sussex. England, 1995.
Siu S., “Non-linear Adaptive Equalization based on multi-layer perceptron
Architecture”,
[5.16]
Faculty of Science, University of EdinburghPh.D. Thesis, , 1990.
77