Modeling em structures in the neural network toolbox of ... Network Toolbox of MATLAB ... A block diagram of the process of training a neural ... function at the output (Figure 4c)

Modeling EM Structures in the Neural Network Toolbox of MATLAB

Zbyngk Raida

Department of Radio Electronics, Brno University of Technology Purkyhova 118 ,612 00 Brno, Czech Republic

Tel: +420-5-4114-9114; Fax: +420-5-4114-9244; E-mail: [email protected]

Abstract

Neural networks are sys tems that can b e trained to remember the behavior of a modeled structure at given operational points, a n d that can b e used to approximate the behavior of the structure outside of the training points. These neural-net approximation abilities are demonstrated in the modeling a frequency-selective surface, a microstrip transmission line, and a microstrip dipole. Attention is given to the accuracy and to the efficiency of neural models. The association between neural models and genetic algorithms, which can provide a global design tool, is discussed. Portions of MATLAB code illustrate the descriptions.

Keywords: Feed-forward neural networks, genetic algorithms, planar transmission lines, frequency selective surfaces, microstrip antennas, dipole antennas, modeling, optimization methods

1. Introduction

n artificial neural network (ANN) is a system that is built in accordance with the human brain. Therefore, an ANN consists of a few types of many, simple, nonlinear functional

blocks, which are called neurons. Neurons are organized into layers, which are mutually connected by highly parallel synaptic weights. The ANN exhibits a learning ability: synaptic weights can he strengthened or weakened during the learning process, and in this way, information can be stored in the neural network [ I , 21.

A " .

Due to its nonlinearity, the ANN is able to solve even such types of problems as are unsolvable by linear systems. Due to the massive parallelism, the ANN exhibits a very high operational speed. Due to the learning ability, the ANN can behave as an adaptive system, which automatically reacts to changes in its sur- roundings. Also, due to the presence of only a few types of functional blocks in the stxcture, the ANN is suitable for implementation in hardware (VLSI circuits) or software (object-oriented approach) [ I , 21.

ANNs have been intensively exploited in electrical engineering since the eighties, when sufficient processor computational power and sufficient computer-memory capacity became available. ANNs have been applied in pattern-recognition systems, and have been exploited for input-output mapping, for systein identification, for adaptive prediction, etc. Dealing with antenna applications, ANNs have been used as adaptive controllers in adaptive antenna arrays [3], have been applied in direction-finding arrays [4], and have been exploited for modeling and optimization of antenna systems.

Concentrating on the neural modeling of antennas and microwave structures, ANNs have been applied to the calculation of the

resonant frequencies of microstrip antennas [SI, to the computation of the complex resonant frequencies of microstrip resonators [6], to the modeling of microwave circuits [7, 81, to the reverse modeling [engineering] of microwave devices 191, to the calculation of effective dielectric constants of microstrip lines [IO] , etc. More- over, neural networks have been applied to the optimization of microwave structures and antennas [ I 1, 121.

The exploitation of neural-network techniques in electromagnetics is even described in a few monographs. In [13], ANNs are shown being applied in RF and mobile-communications, in radar and remote sensing, in scattering, to antennas, and in computational electromagnetics. In [14], ANNs are described as being used for modeling interconnects and active devices, for circuit analysis and optimization.

Moreover, MATLAB users can obtain a Neural Network Tool- box, which is ready for the immediate exploitation of ANNs for modeling, optimization, etc., and which is complimented by a well-written user's guide [15].

In this paper, a brief introduction to neural networks is given in Section 2. The reader can become familiar with the terminology of neural networks, can meet various types of the architecture, different types of neurons, and selected methods of training ANNs. Feed-forward neural networks are introduced here as an appropri- ate tool for frequency-domain modeling of EM structures. Then, Section 2 gives a comparison between the genetic training of neural networks and gradient training. Moreover, a software implementation of neural networks using the MATLAB Neural Network Toolbox is described here. It is concluded that the Neural Network Toolbox is an efficient tool for frequency-domain modeling of EM structures. Therefore, the rest of the paper concentrates only on feed-forward ANNs, implemented in this Toolbox. Section 3 dis- cusses the frequency-domain neural modeling of a selected fre-

46 IEEE Anlenna's and Propagalion Magazine, Vol. 44, No. 6, December 2002

quency-selective surface (an FSS, consisting of rectangular patches on an infinite plane); of a selected transmission line (a TL, made of shielded microstrip in layered media); and of a selected microwave antenna (an MA, a microstrip dipole on a dielectric substrate). These structures are first modeled using numerical methods. In the second step, the numerical results obtained are exploited as teach- ers that can train neural nets. Finally, neural models are compared with the numerical models. In this section, an original description of the influence of a number of training patterns and their position in the modeling space on the model accuracy is presented. Section 4 deals with the exploitation of ANNs for the optimization of the above three structures. The approach presented combines neural models and genetic algorithms in order to reveal regions suspected of containing a global minimum. The conclusion tries to formulate a general algorithm that enables an efficient development of accurate neural models.

2. Neural Networks

An artificial neural network (ANN) is a human-made system that is of a structure similar to the human brain, and that operates in a similar way. An ANN consists of a huge number of relatively primitive nonlinear functional blocks (neurons), which simply process a certain number of input signals into an output signal. All the neurons in the network are usually of the same type.

Single neurons are organized into layers. In a simple typical ANN structure, the inputs of every single neuron in the layer n are connected with the outputs of all neurons in the layer n - I . The outputs of all single neurons in the layer n are led to the inputs of every single neuron in the layer n + 1 (Figure 1).

The first layer of neurons ( n = 0 ) is called the input layer. The inpnt layer primarily distributes input signals to the inputs of the next (second) layer, without any further processing. In the second layer, input signals are processed in a given way, and then they continue their way to the following layers. The last layer in the network is called the output layer. Layers between the input layer and the output layer are called hidden layers (Figure 1).

Neural networks can be trained. During the training period, synaptic weights are strengthened or weakened so that the ANN can react to a given input pattem with a prescribed output response. The strength of the connection between every two neurons is described by a synaptic weight. At the beginning of the training, synaptic weights are set to random values. In a typical, simple ANN structure, input patterns are sequentially introduced to the inputs of the ANN, and synaptic weights are changed in order to reach corresponding output patterns with the desired accuracy, or until a prescribed number of iteration steps is done.

Desired output responses for given input patterns can be obtained from measured results or from numeric-modeling results. The structure is analyzed for a certain number of unique combinations of parameters of the structure (input patterns), and then the ANN is trained, using results of the analysis as desired responses. The computationally modest neural model then approximates a numeric model.

The learning is influenced not only by synaptic weights, but even by the value of biases. A bias, 6, represents a weight that does not couple an output and an input of two neurons, hut which is multiplied by a unitary signal and introduced to the neuron. A bias sets a certain level of the output signal of a neuron that is independent of input signals.

/E€€ Antenna's and Propagation Magazine, Vol. 44, No 6. December 2002

When the training phase is finished (when the ANN can react to input patterns by producing the proper output patterns), the ANN can be used for modeling. In the modeling period, the ANN exploits the information that was inserted into the ANN during training in the form of changed values of synaptic weights and biases.

At present, many types of neural networks have been developed. These ANNs differ in their archifeclure (the connection of single neurons into the network), they can vary in the rype ofneu- rons, they can exploit different ways of learning, etc. In order to get an orientation among the various types of ANNs, a classifica- tion of ANNs according to the above three criteria is given in the following paragraphs. ANNs are classified here in accordance with 121.

2.1 Architecture

There are three basic ways of connecting single nenrons into the network (i.e., three basic types of. architecture are at its disposal). Nevertheless, only the following two architectures can he used for modeling.

input layer

hidden output layer layer

Figure 1. An example of the structure of an ANN. The symbol n;';) denotes the synaptic weight between the output of the ith

neuron in the layer ( n - I ) and the input of the j t h neuron in

layer n. The symbol bp' denotes the threshold of the ith neuron in layer n.

- input pattern

1 I

error I Figure 2. A block diagram of the process of training a neural network.

47

A quite different model structure is used in the adaptive nonlinear neuron. The adaptive neuron consists of an adaptive finite- impulse-response (FIR) filter, completed by a nonlinear threshold function at the output (Figure 4c). The adaptive control, which is the inherent part of the neuron. changes the neuron's svnaDtic - . .

Figure 3a. The feed-forward neural-network architecture. weights in order to minimize the squared error (i.e., the difference between the actual response of the neuron and the desired response). Minimization is performed using standard algorithms (steepest descent, conjugate gradient, the Newton method, etc.).

In all of the above models of a neuron, a proper type of nonlinear threshold function has to be selected. If continuous systems are going to he modeled, a bipolar sigmoid is usually used in hidden layers, because bipolar output signals of neurons increase the number of degrees of freedom when synthesizing the desired I I response.

In the output layer, the bipolar sigmoid is used at the outputs, which are associated with bipolar variables (e.g., the phase can be both positive and negative). If the output is associated with a unipolar variable (ex.. the length of the dinole can onlv be nositiveL Figure 3b. The recurrent neural-network architecture. - I _ . I I .

the respective neuron should contain a unipolar sigmoid as the threshold function.

In the framework of this paper, the McCulloch-Pins model of the neuron is used when genetic algorithms (Section 2.3.1) train the ANN, and an adaptive nonlinear neuron is exploited when gm- dient training is applied (Section 2.3.2).

In the feed-forward ANN (Figure 3a), input signals are directly conveyed from the inputs of the network to its outputs. Since no feedback is present in the network, the feed-forward ANN exhibits a static behavior, and it simply maps pattems from the input space to the responses in the output space, in fact. Feed- fonvard ANNs are described by nonlinear algebraic equations.

In contrast, the recurrent ANN includes feedback in its structure (Figure 3b). Thanks to this feedback, recurrent networks exhibit inertial behavior. Therefore, recurrent ANNs are not con- venient for simple mapping of pattems, hut they can be used to model dynamic processes. Recurrent networks are described by nonlinear differential equations.

In this paper, we are going to work on static modeling, and, Figure 4a. A model of an McCulloch-Pitts neuron. therefore, the feed-fonvard architecture is the only one used.

2.2 Neuron

Neurons are simple functional blocks, which are placed into the empty boxes in the architecture scheme (Figure 3).

The simplest type of neuron, the McCulloch-Pitfs model (Figure 4a), consists of a summing junction and of a non-linear threshold function (an activation function, limiting function, cell Figure 4h. A model of an Grossherg neuron.

body, etc.). The summing junction performs addition of all the output signals of the foregoing layer - which are multiplied by the respective synaptic weights of the neuron - and adds the neuron's bias to the result. Then, the result is constrained, using the threshold function.

In more complex models of the neuron, further functional blocks complete the basic McCulloch-Pitts model. In the Hopfield model, an integrator is inserted between the summing junction and

the integration of nervous pulses in the real human neuron. The Grossberg model of the neuron adds feedhack to the Hopfield model. The feedback should model the process of forgetting, which can be controlled using the parameter y (Figure 4b).

4a

error the threshold function. The integrator should model the process of adaptive control x2

X 3

Figure 4c. A model of an adaptive nonlinear neuron.

IEEE Antenna's and Propagation Magazine, Vol 44, No. 6, December 2002

2.3 Training

During the training (learning), new information is stored into an ANN by changing the synaptic weights. There are many methods of training; however, Hebhian and supervised training are applied in most cases.

Hebbian ruining simulates the natural training process. The value of a synaptic weight lying between two simultaneously activated neurons is increased, and the values of weights connecting neurons that are not activated at the same time are decreased (the process of forgetting).

Supervised learning was already described in the above paragraphs (and was depicted in Figure 2). During the learning process, input pattems are sequentially introduced to the inputs of an ANN, and the actual output responses are compared with desired, known output patterns. Then, the error (the difference between the output response and the output pattern) is computed, and synaptic weights plus biases are changed in order to minimize the error for all the training patterns.

In the following paragraphs, we are going to concentrate on supervised learning. First, global learning using genetic algorithms is described, and then a local gradient-based training is introduced.

2.3.1 Global Training Algorithms

Since ANNs can represent complicated nonlinear systems, the error surface can contain many local minima. The error surface is a function describing the dependence of the squared difference between the actual responses and the desired responses to the value of synaptic weights. Global optimization has to be used in reaching the global minimum of the error surface. Since genetic algorithms have become some of the most popular global-optimization techniques, we choose them as representative of global methods.

Genetic algorithms ate based on the concept of natural selection (the best individuals of today's generation are chosen in order to become parents of a new, better generation). Let us assume a generation of individuals with different properties, and let us judge the quality of those individuals according to the difference between their real parameters and the desired parameters. Then a new, better generation can contain individuals that more closely approach the desired parameters.

If the genetic optimization is going to be applied to the training of an ANN, then the whole neural network has to be considered as a single individual. All the weights and biases (in binaly form) of all the neurons creating the ANN have to be understood as genes, and all the genes of the ANN together have to form an individual. All the input patterns are sequentially introduced to the input of the ANN, and the total squared error is computed. This total squared error serves as a measure of the quality of the ANN. Having more ANNs (more individuals in the generation) at one's disposal, individuals can be ranked by the proper selection strategy, and the crossover of individuals can create a new generation of ANNs. Since new values of weights and biases are created by crossover, no adaptive algorithm is needed for their modification. Therefore, the simplest neurons (McCulloch-Pitts) can be used.

The genetic training of an ANN as described can be illus- trated by building a necral model of a frequency-selective surface

IEEE Antenna's and Propagation Magazine, Vol. 44, No. 6. December 2002

(FSS), which consisted of rectangular, perfect electrical conductor (PEC) elements (Figure 5 ) . Elements were placed on an infinite plane of the same electrical parameters as the surrounding. Both the dimensions of all the elements and their spacing were identical. For simplicity, the height of the conductive element was fixed to the value u = I 1 mm, and the height of the cell was assumed to be A = 12 mm. The width of the conductive element was within b E < 1 mm, 7 mm >, and the width of the cell could be within B ~ < l O m m , 2 2 m m > .

In the first step, a certain number of FSSes were analyzed, differing in b and B, using the spectral-domain Method of Moments [24]. In the analysis, a linearly polarized plane wave (with electric intensity vector parallel to y), amiving from the direction normal to the FSS, was assumed. As a result, we obtained the frequency, fz, of the first maximum of the reflection-coefi- cient modulus of the Floquet mode (O,O), and the frequencies fi and h for a 3 dB decrease of the reflection-coefficient modulus (fi < f 2 <h). Doublets [b, B ] formed the input patterns, and

respective triplets [fi, f2,h] were the desired output responses. Therefore, the input layer of the ANNs consisted of two neurons, and the output layer contained three neurons (Figure 6) . Since all the output quantities ( f i , f 2 , f , ) were positive numbers, the output neurons should contain an asymmetric sigmoid as the nonlinearity

While the structure of the input layer and the structure of the output layer were given by input pattems and output responses, the number of hidden layers and the number of neurons in hidden layers were not fixed. A nonlinearity in hidden neurons was symmetric, in order to yield flexibility in forming input patterns into output responses.

The above-described principles can be simply programmed in MATLAB. The M-files described in this paper can be downloaded via http://w\nv.feec.vutbr.cz/-raida (follow the link "Artificial Neural Networks in Antenna Techniques").

b I I 1---1 LI

1--- 1-- -1 Figure 5. A frequency-selective surface (FSS) consisting of perfectly electrically conductive (PEC) rectangles. The rectangles a re assumed to he equidistantly placed on a n infinite plane of the same electrical parameters as the surrounding environment.

49

http://w\nv.feec.vutbr.cz/-raida

Figure 6. The structure of the neural model of the FSS. The numbers of neurons in the input layer and the output layer are given; the number of hidden layers and the number of neurons in the hidden layers are to be determined.

4 0.7

0.6

0.5

0.4

0.3

0.2

0.1 40 80 120 160 -=-

iterations Figure 7a. A comparison of the Convergence properties of the genetic training algorithms: the best learning curve (solid) and the worst learning curve (dashed) for population decimation.

A I

I. 0.1 ' I

0 40 80 120 160 --=- iterations

Figure 7h. A comparison of the Convergence properties of the genetic training algorithms: the best learning curve (solid) and the worst learning curve (dashed) for tournament selection.

50

iterations Figure lc. A comparison of the convergence properties of the genetic training algorithms: The time history of the average squared error of population decimation (solid) and tournament selection (dashed).

In the main module, trainga.m, all the parameters of the genetic training (the number of individuals, ind; the number of bits per weight and threshold, wbit; the probability of crossover, pcr, and mutation, pmut; the number of generations, iter; ...) and the basic parameters of the ANN (the number of neurons in layers, N, the trpe of nonlinearity in layers, nlin; ... are initialized. Then, the first generation is created as a matrix of binary numbers, gen: the number of rows is equal to the number of individuals, the columns correspond to an individual, and the last column is added in order to store the value of the cost of the individual. In the individual (i.e., in the row of the matrix qen), hinaycoded weights are stored, followed by binary-coded thresholds.

When the first generation gen is initialized, the main iteration cycle starts. In this cycle, the order of training pattems is changed first, [P,Tl=reorder(P,T), and then the binary information of the individual is converted into real weights and biases, [w, bl =decode (gen) . Real weights and biases are needed for evaluating the cost function for all the individuals over all the training pattems. Weights and biases of the best individual in a generation are archived in the vector x, and the corresponding value of the cost function is stored in the vector e. Finally, the value of the cost function of individuals is used for breeding a new generation of A M s , using population decimation gen=decim(gen,pcr,pmut, ind,cbit), or toumament selection,gen=tour(gen,pcr,pmut,ind,cbit). Themain iteration cycle runs for a prescribed number of times, iter.

The above-described leaming procedure was used for training ANNs, which consisted of two hidden layers. Every hidden layer contained five neurons. In the individual, eight bits represented every weight and bias. Generations consisted of 30 individuals. The probability of crossover was set to 90%, and the probability of mutation was set to 10%. The training was done over five realizations. In Figure I , the best ieaming curve and the worst curve (the time history of the squared error) for population decimation (A) and for toumament selection (B) are depicted. Although a global training algorithm was used, there are significant differences between the qualities of leaming of different realizations.

IEEE Antenna's and Propagation Magazine, Vol. 44, No. 6 , December 2002

Table 1. The learning patterns of an ANN trained by genetic algorithms: The frequency, fz [GHz], for the maximum of the reflection-coefficient modulus, and the frequencies fi and f,

for its 3-dB decrease, when B [mm] and b [mm] are varied; a= 11 " , A = 12 mm.

10.0 mm

B = 13.0 16.0 19.0 22.0 mm mm mm mm 1 fi I 9.1 1 9.5 i 9.2 1 8.8 1 1::: 1

b = I . O m m f z 14.8 13.4 12.2 11.2

20.6 17.4 15.0 13.1 11.7 1 ;;Zl 1 9.2 ~ 9.0 1 8.7 1 8.3 1 b=4.0mm fi 13.7 12.5 11.3 10.3

Table 2. The dependence of the CPU time (in seconds) required for performing one training cycle on the number of individuals

in the generation.

Individuals CPU Time, CPU Time, Population Tournament

An ANN was trained on a training set consisting of 15 training pattems (see Table 1). The input quantities of the ANN were sampled with the spatial step Ab = AB = 3 mm, and respective frequencies [,fi,h,h] were computed for all the combinations of spatial samples

The CPU-time requirements of the genetic training were investigated using the MATLAB profiler (Table 2). The CPU-time requirements depended strongly on the number of individuals in every generation (considering our experience, 20 individuals seems to be a good choice).

In the following section, classical optimization techniques (gradient methods) are applied to the training process, so that genetic training can he compared with the classical approach.

2.3.2 Local Training Algorithms

Since the CPU-time requirements of global training algorithms are relatively high, ANNs can be trained by local-optimization methods. The danger that the training process will deadlock in a local minimum of the error function can he reduced by reordering the training pattems in every iteration cycle, or by a multi-start training with random values of weights and biases.

Among local training techniques, gradient-based methods (the steepest-descent method, the conjugate-gradient method, etc.), are popular. Exploitation of the gradient algorithms for the training of ANNs results in the following implications:

I . Since values of state variables (weights, biases) are changed in the direction contra to the gradient of the error function, the error gradient has to be distributed from the output of the neural network to all the neurons. Since the error propagates in the opposite direction from the input signals, the respective ANN is called a back- propagation neural network.

2. State variables (weights, biases) are changed in order to reach the minimum of the error function (Le., the gradient of the output error is enforced as being zero). Therefore, the values of state variables have to he adaptively modified, in order to meet the described goal. Obviously, the back-propagation neural networks have to be built from adaptive nonlinear neurons.

A training procedure that considers the back propagation of the error signal and the presence of adaptive neurons in the network comes forth at the instantaneous value of the error of the jth output neuron in the kth iteration step (i.e., the kth training pattem is considered):

e j ( k ) = d j ( k ) - y j ( k ) . (1)

Here, d j ( k ) denotes the desired value of the output signal at the

j th neuron in the kth iteration cycle, and y j ( k ) is the actual

response.

Let us introduce the average squared error as the sum of the instantaneous errors over all the outputs of the ANN and over all the pattems; the summation is divided by the number ofpattems. If the average squared error is going to he minimized with respect to the weights and biases, then the error has to he expressed as the function of those parameters. Expressing the actual output signal at the jth neuron in the kth iteration cycle can do this:

Here, wji denotes the ith synaptic weight of the j th neuron in the

output layer (for simplicity, the threshold is considered as a weight of index 0 b, = wjo) ; the xi are signals at the inputs of the synap-

tic weights (Le., the outputs of neurons from the foregoing layer, and xa = - I for the threshold); p denotes the total number of inputs ofthe j th neuron except ofthe threshold; and Jj symbolizes

the nonlinear activation function of the j th neuron (see Figure 4c).

Minimizing the average squared error, the average value is approximated by the instantaneous error of the actual iteration (the training pattem). The synaptic weights of neurons in the output layer are shifted in the opposite direction to the local gradient:

/E€€ Antenna's and Propagation Magazine, Vol. 44, No. 6, December 2002 51

is denoted as the locnlgradient of the instantaneous error. In this way, the value of the instantaneous quadratic error at the output is decreased. The symbol 7 denotes the learning constant, which influences the convergence rate, on the one hand, and the stability of the learning process, on the other hand.

The method described can be applied to resetting weights in the output layer. In the case of neurons in hidden layers, local gradients have to be recomputed from the output to the respective layer. Performing the re-computation, the following relation for the local gradient is obtained

Here, variables containing the index; are related to thejth neuron in the hidden layer, and variables containing the index n are related to the neurons in the output layer. The local gradient of the j th neuron in the hidden layer is evaluated by summing the local gradients of all the neurons in the output layer, S,, multiplied by the synaptic weights, wnj, which connect the output of the jth hidden neuron

and the inputs of all the neurons in the output layer. The result of the summation is multiplied by the derived activation function of thejth hidden neuron, according to vi of this neuron.

Obviously, the error propagates a similar manner as the input signal in ANNs (only the direction of propagation is opposite).

If local gradients in the outputs of the hidden layer are evaluated, the synaptic weights and biases can be modified, Equa- tion (3). Further re-computation of local gradients from the last hidden layer to the foregoing one enables changing the synaptic weights of neurons in the respective layer. In this way, the procedure continues until the first hidden layer is reached. Modification of synaptic weights in the first hidden layer'terminates a single learning cycle. Repeating single learning cycles over all the training patterns in the training set forms one iteration step of the training procedure. Iteration steps are repeated as long as the error does not fall below the prescribed level.

The training procedure described is denoted as pattern-by- pattern learning. As an altemative, batch learning, can be implemented, which consists of updating the weights after the presenta- tion of all the training patterns of the training set. The pattern method can be more stochastic when training patterns are reor- dered; the batch method can provide a more-accurate estimate of the gradient.

Now, we can try to implement the training method in MATLAB. In order to enable a comparison of the local training method with the genetic method, we programmed ANNs of the same structure (two input neurons, two hidden layers consisting of five neurons, and three output neurons), and we trained it on the same training set (the 15 patterns listed in Table 1). In this way, a simple neural model of the FSS (Figure 5 ) was built.

Comparing the main module of the gradient training, traingr .m, with its genetic counterpart, trainga..m, only minor differences can he identified here:

1. In the declaratory part of the module, the number of individuals in a generation, ind ; the probability of crossover, pcr; the probability of mutation, pmut; and the number of bits for coding synaptic weights, wbit; are omitted in the gradient-training code. On

52

the other hand, the learning constant, 7, (denoted IC) appears here.

2. In trainqr.m, weights and biases are generated as real random numbers from the interval (0, I). Both weights and biases are organized in row vectors w and b, as depicted in Figure 8. In the genetic training program, weights and biases were organized the same way, but they were binary coded, and were present in more copies (more individuals in every generation) in the computer memory.

3. The main iteration cycle starts with reordering training patterns in the training set (the m-file re0rder.m is the same for both gradient training and for genetic training). On the other hand, decoding binary parameters to real parameters, and a cycle csver all the individuals in a generation, are omitted in the gradient-training code.

4. The rest of the training code is concentrated in the m-file backprop .m. The function returns new values of synaptic weights and biases (after performing a single learning cycle), and sets a global variable, e, which contains a vector of error:, in the outputs of ANNs for the respective training pattem. The fiinction backprop performs pattem-by-pattem training.

Concentrating more deeply on the backprop coc!e, two hasic blocks can be found. The first block simulates the signal flow from the inputs of an ANN to its outputs (the solid lines in Fig- ure 8). This block computes the responses of an ANN to ii given input pattern, and, from the implementation point of view, it is identical with the function nsim.m. The second block simulates the signal flow from the output of an ANN to its inputs (the dashed lines in Figure 8). This way, the gradient of the error function can be distributed to all the neurons, and to all biases plus synaptic weights, and can adaptively modify them in order to minimize the training error.

The results of the back-propagation training are depicted in Figure 9. Figure 9a compares the average learning curves (over Fix realizations) for different values of the learning constant. Compari- sons of the best training curves for the very high training constant (solid) and for the optimal training constant (dash) are depicted in Figure 9b.

Figure 8. The structure of the hack-propagation neural network implemented in MATLAB. Matrix X contains thi! signals a t the outputs of the neurons, and vector w contains ths: synaptic weights.

IEEE Antenna's and Propagation Magazine, VoI. 44, No. 6 , Decernber 2002

I 0 40 80 120 160 --j

iterations

0'

Figure 9a. A comparison of the convergence properties of the pattern-by-pattern gradient-training algorithms: the time history of the average squared error for a learning constant

7 = (dash-dot), 7 =io-' (dash), and 7 =lo- ' (solid).

iterations Figure 9b. A comparison of the convergence properties of the pattern-by-pattern gradient-training algorithms: the best learning curve for a learning constant 7 =lo-* (dash) and

7 = IO-' (solid).

Dealing with the efficiency of the software implementation of the back-propagation algorithm in MATLAB, the execution of one iteration step consumed 0.54 seconds of CPU time on a Digital 433au workstation, which was approximately 10 times less than in the case of genetic training.

The above-described m-files can simply he rewritten in order to implement the batch training. The module backpr0p.m is invoked to compute changes ofweights and biases, Awy and Abj ,

for a given training pattem, without modifying wy and b, (see

batpr0p.m). The changes, Awii and Ab,, are accumulated for

IEEEAnlenna's and Propagation Magazine, Vol. 44, No. 6. December 2002

all the training patterns in the training set, and at the end of the iteration, the setting of weights and biases is modified (see batgr .n). Batch training is characterized by very smooth learning curves (Figure IO).

In the next section, the neural model of an FSS is planned, built using the Neural Network Toolbox of MATLAB, so that the final comparison of neural nets can be done. In the Toolbox, we are going to concentrate on batch training.

2.3.3 Neural Network Toolbox

A neural model of the above-described FSS can be efficiently built using the Neural Network Toolbox of MATLAB. Therefore, the training set, consisting of the matrix of input patterns, P, and of the matrix of desired output responses, T, stays unchanged (i.e., the number of columns of P and T is equal to the number of training pattems, the number of rows of P corresponds to the number of inputs of an ANN, and the number of rows of T corresponds to the number of outputs of an ANN).

The declafation of the leaming set is followed by normaliz- ing outputs, and by creating a new feed-forward back-propagation neural network. Calling the standard function of the Toolbox, newf f, simply creates the ANNs. The first parameter of newf f is a matrix consisting of two columns and as many rows as there are inputs of an ANN. The first column contains the minimum value introduced to the respective input, and the second column contains the maximum value of the input. The number of neurons in hidden layers and the number of output neurons are determined through the second parameter of newf f (the number of input neurons is given by the dimension of the matrix P, and, therefore, this information is not required). In the third parameter, the type of the nonlinear activation function in hidden layers and in the output layer has to he given. The last parameter determines the learning algorithm used for training a neural network. In our situation (the above-described neural model of an FSS), the call of newf f is of the following form:

"0 40 80 120 160 + iterations

Figure 10. The convergence properties of the gradient batch- training algorithm: The time history of the average squared

(solid).

error for a learning constant 7 =lo-* (dash) and q = 10- 1

53

Table 3. Selected hatch training algorithms available in MATLAB’s Neural Network Toolbox.

Function Method

- ~

traingdm SD with momentum traingda traincfg

traincgp

trainbfg Quasi-Newton (QN) BFGS

trainlm QN Levenherg - Marquardt t rainbr Bayesian regularization

SD with adaptive learning rate

Conjugate gradient (Fletcher - Reeves) Conjugate gradient (Polak - Ribiere)

Algorithm

Best error

Averageerror

Table 5. A comparison of the CPU-time demands, in seconds, of the hatch-training algorithms in Table 4.

IAleorithm I sd-own 1 sd-thx. 1 Im-tbx. I br-tbx. I

sd-own sd-thx. Im-tbx: br-tbx.

5 . 6 ~ IO-’ 6 . 5 ~ IO-’ 1 . 6 ~ IO-’ 1.1 x IO9

5.6xIO-’ 1 .2~10- l 2.2x10-5 2 . 6 ~ ~ ~

Time [SI I 0.54 I 0.04 I 0.07

net=newff ([min(min(b)) max(max(b)); min (min (B) ) max (max ( E ) ) 1 , . . . [ 5, 5. 31. . . . ( tansiq‘ , ‘ tansiq’ , ’ loqsiq‘ ) , . . ,

> trainqd’ ) ;

Here, tansiq denotes a bipolar sigmoid, loqsiq is a unipolar sigmoid, and traingd is a batch-training algorithm based on the steepest-descent method. Other selected hatch-training algorithms that are available in MATLAB’s Neural Nehvork Toolbox are listed in Table 3.

If an ANN is created (the weights and biases are random numbers), the parameters of the training process have to he set: lr denotes the learning rate of the steepest-descent algorithm, show determines the number of iteration steps performed between two printouts of the training status in MATLAB’s command window, epochs denotes the number of iteration steps until finishing the training, goal gives the error lever below which the training is stopped, etc. The above parameters can be set as follows:

net.trainParam.lr = 0.01;

net.trainParam.show = 10;

net.trainParam.epochs= 200;

net.trainParam.goa1 = le-5;

% l ea rn . r a t e of steepest descent

% s t a t u s shown each 1 0 t h iteration

% stopped if number of iter. > 200

% stopped if error below le-5

54

0.08

If the above settings are not done, the training is perforrned with default values of the parameters. The training is started b y calling the function

net = train( net, P, TI;

Finally, a response, S, of the trained ANN to an arbitrary input pattern, Q, can be obtained by simulating the operation of the ANN (Tmax is the maximum output value, which is usemi to nor- malize output patterns):

S = sim( net, Q ) Tmax;

We cancentrate an the performance and CPU-time demands of the steepest-descent (st-tbx. in Table 4) for the unspecified learning constant, of the quasi-Newton (Levenberg-Marquardt) algorithm (Im-tbx. in Table 4), and of the Bayesian regularization (br-thx. in Table 4).

Dealing with the performance of these training all:orithms, we let each of them nm six times, we recorded the hesl training result aAer 200 iteration steps, and we computed the average value of the training error aAer 200 iterations. The results obtiiined are summarized in Table 4, for default adaptation constants.

The CPU-time demands of the above algorithms are given in Table 5. Although Bayesian regularization seemed to he disquali- tied by the comparison, it provided very important information about the number of efficiently used parameters (weights and biases) during its run. If the number of efficiently used pzrameters is too small in comparing the total number of paramders, the quantities approximated by an ANN can oscillate (so-called over- training appears). On the contrary, if the number of efficiently used parameters approaches the total number of parameters, the number of neurons should be increased, in order to reach a smaller training error (i,e., better training is limited by the insufficient number of free parameters in this case).

The final conclusion of this paragraph can be summarized by the following items:

I . The quasi-Newton training algorithm (the smallest ;,est-case error) and the Bayesian regularization (an estimate of the number of efficiently used parameters) seem to be the most suitalile training methods.

2. The Neural Network Toolbox seems to he an efficient modeling tool for electromagnetic structures. If the Toolbox is at one’s disposal, its m-functions should he preferred in writing me’s own code.

2.4 Conclusion

In Section 2, we introduced the reader to the basics of neural networks and to their implementation in MATLAB. The following conclusions can he drawn, observing the contents of Table 6:

1. Genetic training does not show better convergence than local training with reordering training patterns. The CPU time demands of genetic training are the highest.

2. The paitem-to-pattern version of the steepest-descent algorithm shows better convergence than the batch version. The CIPU-time demands are the same.

IEEE Antenna’s and Propagation Magazine. Vol. 44, No. 6 , Decenlber 2002

Table 6. A comparison of the performance and CPU-time demands of the training algorithms examined. Genetic algorithms: 20 individuals per generation, eight hits per weightmias, 90% probability o f cross-over, 10% probability of mutation; steepest

descent: 7 = IO-’ for pattern-to-pattern training, 7 = IO-’ for batch training: Toolbox training: default settings used.

Genetic Trainin

ournamen

3. Batch algorithms are preferred in the Neural Network Toolbox. For EM modeling, the quasi-Newton (Levenberg-Marquardt) method and Bayesian regularization seem to be the most suitable approaches.

Considering the above conclusions, only Bayesian regularization and the quasi-Newton method are used in the rest of this paper. In the following section, both training methods are exploited for building accurate neural models of a frequency- selective surface (FSS), a microstrip transmission line (TL), and a microstrip antenna (MA).

3. Neural Modeling of EM Structures

In the previous section, our investigation was oriented towards the selection of the proper architecture of ANNs, the proper type of neurons, and the proper training algorithm for building neural models of EM structures represented by an FSS. In dealing with the architecture ofANNs, we are going to concentrate on the feed-forward structures, because feed-foward ANNs statically map input patterns to output ones. In dealing with learning, back-propagation ANNs, driven by the quasi-Newton algorithm (Levenberg-Marquardt) or by the Bayesian regularization as implemented in the Neural Network Toolbox o f MATLAB, seem to be the most suitable. In dealing with the types of neurons, back- propagation ANNs require adaptive non-linear neurons that modify the settings of weights and biases, in order to minimize the learning error of an ANN

Attention can now be turned to building neural models of selected EM structures that are as accurate as possible, and the preparation of which consumes as short a time as possible. The set of selected EM structures consists of a frequency-selective surface, as an example of a scatterer; of a microstrip transmission line, as an example of a transmission line; and of a microstrip dipole, as an example of an antenna.

3.1 Frequency-Selective Surface

The FSS modeled is depicted in Figure 5 . The FSS consisted of equidistantly distributed identical rectangular PEC elements. The conductive rectangles were positioned in the center of a discrete cell of the infinite plane with the same electrical parameters as the surrounding environment. The height of the conductive element was fixed at a value a = 11 mm, and the height of the cell was assumed to be constant at A = 12 mm. The width of the conductive element was changed within the interval b E< 1 mm, 7 mmr, and the width of the cell could be in the range B t< I O mm, 22 mm>.

IEEEAntennak and Propagation Magazine. Vol. 44, No. 6. December 2002

The FSS being described was numerically modeled by the spectral-domain Method of Moments [24], utilizing harmonic basis and weighting functions. As a result, the frequency, f i , of the first maximum of the reflection-coefficient modulus of the Floquet mode (O,O), and the frequencies, fi and f,, for the 3-dB decrease

of the reflection-coefficient modulus (h < fi < h) were

obtained. The analysis was performed for perpendicular incidence of a linearly polarized EM wave, with an electric-intensity vector that was oriented in the direction of they axis (see Figure 5) .

The neural model of the FSS consisted of two input neurons (doublets [b , B ] formed the input patterns), and the output layer

contained three neurons (the respective triplets [fi,/,,h] were

the desired responses). Since all the output quantities (fi,fi,h) were positive numbers, the output neurons contained a unipolar sigmoid as the nonlinear activation h c t i o n (i.e., the opposite type of nonlinearity vias used at the output neurons, instead of at the hidden neurons).

Before the training of an ANN is started, the number of training pattems, their position in the training space, and the number of hidden neurons have to be determined.

In discussing the training pattems, there are two contradic- tory requirements: the building process should consume as short a time as possible (i.e,, the number of training pattems should be minimized), and the neural model developed should be as accurate as possible (Le,, the number of training pattems should be high). Therefore, some compromise had to be found, in order to get a relatively accurate model that could be quickly developed. There- fore, the input space of the ANN was first sampled with a constant sampling step. The sampling step was relatively long, in order to obtain an initial notion about the behavior of the modeled stmcture with minimum effort. Second, the sampling was refined, in order to reach a desired accuracy.

In discussing the number of hidden neurons, the initial architecture had to be estimated. Then, the Bayesian regularization was run, and the number of hidden neurons was changed until the number of efficiently used parameters did not fall between 60% and 90%.

In the initial step, both bE< Imm, 7 mm> and B E < 10 nun, 22 mm > were changed with a discretization step of Ab= AB=3.Omm, and the output responses [fi, f2,.f3] were

computed for all the combinations of [ b , B ] . Thus, 3 x 5 = 15 analyses bad to be performed. The complete training set was stored in the Excel file fss.xls.

55

Using Bayesian regularization (fss-br-30mm.m), the proper structure of an ANN was estimated if an ANN contained two hidden layers consisting of five neurons each, then 60% of the parameters were efficiently used (aAer 500 iteration steps). When the proposed ANN was trained using the Levenherg-Marquardt algorithm (f ss-lm-3 0". m), the training error reached a level of IO-' within 98 iteration steps (the best result from five training processes performed). '

The accuracy of the neural model (fss-111-3 . m a t ) over the training area was tested by comparing the results of the numerical analysis and the respective simulation results of the ANN. For every input pattern, the relative errors were computed and averaged. The result was called the cumulative error:

Here, b is the width of the metallic element, B denotes the width of the cell,f, is the frequency obtained by the numerical analysis, and f, is the frequency produced by the neural model (n = I, 3 are

Figure l l a . The cumulative error of the neural model of an FSS for a constant sampling step Ab = AB = 3.0 mm in the interval B t 4 0 . 0 mm, 16.0 mm> and b E 4 . 0 mm, 7.0 mm>.

cI"sa 5

Figure l l h . The cumulative error of the neural model of an FSS for finer sampling, 46 = A B = 1.5 mm in the interval B ~ < l O . O m m , l 6 . 0 m m > a n d bs<l.Omm,7.Omm>.

1.5

1 .o 0.5

0

Figure l l c . The cumulative error of the neural model of an FSS with the initial sampling refined in order to reduce the relative variation of f i .

associated with the 3 dB decrease of the modulus of the rdlection coefficient; n = 2 corresponds to its maximum).

The cumulative error of the neural model f ss_lm_.3 .mat is depicted in Figure I la. Points corresponding to training panems exhibit negligible error (e.g., [b, Bl = [l , 101, [4, 101, 117, 101). Whereas the cumulative error for B > 16" might he considered to be sufficiently small, the error was vely high for B < 16.".

In order to increase accuracy of the model, the part of the input space corresponding to an unacceptably high error ( B < 16 mm for all 6 ) was re-sampled with a smaller discr<:tization step (A6 = A B = 1.5"). Therefore, 35 training pattern$; ( 5 x 7) had to be prepared, in this case. The new ANN consisted of three hidden layers containing 6, 3, and 6 neurons. Perfomling the Levenberg-Marquardt training (f ss-lm-15mm. m), the training error reached a level of within 606 iteration cycles ((the best result from five training processes performed). Ohserving the accuracy of the new model (f ss-lm-l5c. mat) in Figure 11.3, a very low error was reached, except for the area near the point [6, B] = [7,10]. Even finer re-sampling of the surrounding;s of this point could again reduce the error in the respective area.

In practical neural modeling, the distribution of the approximation error is unknown, because the modeled structure is analyzed only for the training patterns. Therefore, a different crite- rion for pattern refinement has to be found.

Observing training patterns in fss.xls, the approximaled function f , = f n ( 6 , B ) , n = 1,2,3, was vely steep in the area of the highest error (i.e. 4fn was high for two neighboring leanling patterns). If the sampling in this area were refined, then Af, would he reduced for neighboring patterns. Therefore, from a practical standpoint we can conclude that 4fn should be similar for all the neighboring patterns in the training set.

Let us verify the above conclusion. In fss.xls, we (:an compute the relative variation of the nth approximated (output) quantity with respect to the ith input quantity:

56 IEEE Antenna's and Propagation Magazine, Vol. 44, NO. 6, December 2002

Table 7a. The percentage variation of the central frequency, f2 , for neighboring widths of cells. Unacceptable variations

a re highlighted.

b=4.0mm ) .1:4;7%1 10.6%). 10:1%) 9.3% b=7.0mm 1’&.p/,I:>l8.9%I 13.5%1 9.9%

Table 7b. The percentage variation of the central frequency, f i , for neighboring widths of elements. Unacceptable varia-

tions a re highlighted.

n = 1,2,3

where i indexes the respective input parameter in the training set. If the relative variation exceeds a prescribed level, then a new training pattem is inserted between two already existing pattems.

In our case, we required the relative variation to he lower than 10% for the central frequency, f2. This condition was not

met for pairs [b,B, -Bi+,] and [b, -bi+l,B] as indicated in Table 7. Therefore, we had to insert eight new training pattems into the existing training set: p I = [4 mm, 11.5 mm], p2 = [4 mm, 14.5mm1, p3=[4mm, 17.5mm1, p 4 = [7mm, 11.5mm1, p s = [7mm, 14,5mm], p 6 = [7mm, 17.5 mm], p7= [5.5mm, 10 mm], and p8 = [5 .5 mm, 13 mm].

A new training set, containing 15+8 = 23 pattems, was used to train the ANN (three hidden layers, consisting of 5-3-5 neurons; training error lower than within 530 iteration steps, the best result from five training processes performed considered). The cumulative error of the neural model (fss-lm-xe.mat) is depicted in Figure I IC. The error was lower than 1.5% over all the output space, and even the number of training pattems was lower (23 versus 35). Moreover, no information about the error distribution over the input space was required. Electing for a lower admis- sible error than 10%. the number of training pattems have to be increased, on the one hand, and the approximation error can be reduced, on the other hand.

In the following section, the procedure of building neural models described is applied to a transmission line.

IEEE Antenna’s and Propagation Magazine, Vol. 44, No. 6 , December 2002

3.2 Transmission Line

The microstrip transmission line modeled is depicted in Fig- ure 12. The transmission line was assumed to be longitudinally homogeneous. The transmission line was shielded by a rectangular waveguide of PEC walls, with dimensions fixed at A = B = 12.7 mm. A lossless dielectric substrate of dielectric constant E< 1.0,5.0 > and of height h = 0.635 mm was placed at the bottom of the shielding waveguide. A PEC microstrip of negligible thickness, f = 0 , and of fixed width w = 1.27 mm was placed at the center of the substrate. The microship could be covered by another dielectric layer, of dielectric constant E< 1.0,5.0> and of height h = 0.635 mm. A vacuum was assumed above the second dielectric layer. This transmission line was numerically modeled by a Finite-Element Method, exploiting hybrid nodal-edge finite elements [30, 311. As a result, propagation constants of the dominant mode at the frequency f, = 20 GHz (PI ) and at the frequency fi = 30 GHz ( p 2 ) were obtained.

The neural model of the transmission line consisted of two inputs, because doublets formed the input pattems. The output layer again contained two neurons, because respective doublets of propagation constants [,81,/32] formed the desired output responses. Since the propagation constants were positive numbers, a unipolar sigmoid was used as the activation function in the output layer.

Both the dielectric constant of the substrate, &rI E< 1.0,5.0 >, and the dielectric constant of the second layer, &r2 E< 1.0,5.0 >,

8

-- I EO

I I I I I I I

A

Figure 12. A shielded microstrip transmission line on a snb- strate ( + , , h ) , which might be covered by another dielectric layer ( E , ~ , h ). Longitudinal homogeneity is assumed. Losses in the dielectrics and metal a re neglected.

57

Table Sa. The percentage variation of the propagation constant a t 20 GHz for neighboring dielectric constants of the cover.

Unacceptable variations a re highlighted.

I Substrate &, WI I 5.0 I 5.0% I 4.4% I

Table 8h. The percentage variation of the propagation constant a t 20 GHz for neighboring dielectric constants of the substrate.

Unacceptable variations a re highlighted.

Cover &,

Substrate 1 - 3 44.3'

were changed with a discretization step A = 2 in the initial stage of constructing the neural model of the transmission line. The desired output responses, [,B,,,B2], were computed for all of the

combinations of [ & r l , ~ a ] , Therefore, 3 x 3 = 9 numerical analyses had to he done. The resultant training set can he found in the Excel tile tl.xls.

As described in Section 3.1, the initial training set was tested from the point of view of relative variations among output patterns. In Table 8, the relative variation was computed for propagation constants at a frequency f = 20 GHz. If the relative variation was required to be lower than IO%, then the training set had to be completed by the additional eight patterns pI = [ L O , 1.01, p z = [4.0,

1.01, p3 = [1.0, 2.01, p4 = [3.0, 2.01, p5 = [5.0, 2.01, p6 = [1.0,

4.01, p7 = [3.0, 4.01, and p8 = [5.0, 4.01. In the brackets, the first value was associated with &rZ. and the second one was associated with When these patterns were included into the training set, new testing of relative variations was performed. As a result, the other two training patterns, p9 =[IO, 2.01 and pl0 = [4.0, 2.01, were required to be included into the training set.

In total, the training set contained 9 + 8 + 2 = 19 pattems. Exploiting our experience with building the neural model of an FSS, we initially used an ANN consisting of five neurons in each of the two hidden layers. The Bayesian regularization told us that 60% of the parameters were efficiently used (296 iteration cycles, desired error IO-'). Since the result seemed to be ail right, we ran the same learning using the Levenberg-Marquardt algorithm. The network was trained within 141 cycles. (tl-lm-55a.mat). A cumulative error up to 4% was observed in verifying the accuracy ofthe neural model (Figure 13a).

Let us try to interpret the relatively high cumulative error, corresponding to non-training patterns as an over-training of the ANN. If the ANN is over-trained, then the approximation at the output of the ANN oscillates among training patterns. Therefore, the training error is very small, but the approximation error is relatively high.

58

in order to solve this problem, the number of hidden neurons was reduced to four in each of the hidden layers (70% of eff- cientiy used parameters). If the maximal training error was set to

the ieaming process was finished within 1531 cycles, and the value of the maximum cumulative error was about I% (tl-lm-44a .mat). If the desired training error was reduced to

then the training was over within 812 cycles, and the cumuia- tive error was lower than 0.6% (tl-lm-44b.mat), as iepicted in Figure 13b. If the number of hidden neurons was further reduced to three in each hidden layer (80% of efficiently used parameters), then the ANN was trained within 150 cycles, both for desired training errors of and lo-' (tl-lm_33a.mat, tl-lm-33b .mat). In both cases, the cumulative error again reached a value of I%. This result was caused by the fact that the ANN contained an insufficient number of free parameters in order to be well trained.

Keeping the above results in mind, we can postulate the validity of following conclusions:

* The number of efticiently used parameters should be within the interval of 65% to 75%.

Figure 13a. The cumulative error of the neural model of a transmission line with five neurons in each of the two hidden layers, and a desired training error of 10".

0.6

0.4

0.2

0

Figure 13h. The cumulative error of the neural model of a transmission line with four neurons in each of the two hidden layers, and a desired training error of lo4.

IEEE Antenna's andPropaQation Magazine. Val. 44, No. 6, December 2002

Top View - A

Side View

Figure 14. A microstrip dipole on a dielectric substrate of dielectric constant E, and height h. Both the dipole and the reflector (the ground plane) a re perfectly conductive. I t is assumed there a re no losses in the dielectrics.

* Training should he finished within a reosonable number of iteration steps (below 1000, in our case).

The value of the desired training error has to be selected in a way such that both a reasonable number of leaming cycles and quality training are reached (

The above practical conclusions can be verified on the final neural model of the FSS (Figure 1 IC, 5-3-5 hidden neurons, a training error lower than 530 iteration steps). Although the number of efficiently used parameters was about 60%, over-training was eliminated here by the use of the bottleneck (a narrow central layer, consisting of three neurons).

or IO-', in our case).

T

The processes of building neural models of the FSS and the transmission line are similar: approximated unipolar output quantities monotonically change when continuous input parameters are changed. In the next paragraph, a different situation appears: the output quantity (the input impedance of a microwave antenna, Fig- ure 14) is bipolar (the reactance can be both positive and negative), and it is not of a monotonic nature (the impedance characteristics of a microstrip dipole exhibit a resonance). Moreover, two input parameters can be changed continuously (the length of the dipole and the width of the dipole), and two can acquire discrete values, only (the height of the substrate and the dielectric constant of the substrate). Therefore, the procedure developed for building neural models has to be modified.

3.3 Microstrip Antenna

The modeled microstrip antenna (MA) is depicted in Fig- ure 14. The microstrip antenna consists of a microstrip dipole of length A E c1.0 mm, 4.0 mm> and of width B E C0.05 mm, 0.10 mm>, that is supplied by a symmetric transmission line. The metallic ground plane plays the role of the planar reflector. The microstrip antenna was fabricated from dielectric substrates of dielectric constant &r = [1.0,1.6,2.0] and with a height of h =

Il.0 mm, 1.5 mm]. Losses, both in the dielectrics and in the metal, were neglected. The antenna was assumed to operate at a frequency f = 30 GHz. The antenna was numerically modeled by the Method of Moments [27-291, using piecewise constant hasir functions and Dirac weighting. The analysis results were the vaiuf of the input impedance of the antenna, Z , = R, + jX , . at the operating frequency f = 30 GHz.

IEEEAnlenna's and Propagation Magazine, Vol. 44, No. 6 , December 2002

The neural model of the microstrip antenna consisted of four inputs, because quadruplets [A,B,&, ,h] formed the input pattems. The output layer contained two neurons, because respective doublets [R,-,,X,] formed the desired output responses. Since the

input reactance of the antenna, Xi", could be both positive and negative, the output neurons contained a bipolar sigmoid as the nonlinearity (i.e., the same type of nonlinearity as the hidden neurons). In dealing with the proper discretization of the input space, only M and AB had to be determined, because the discretization of h and &r was prescribed. Although the dimension of the input space was four, we operated on the two-dimensional input spaces [A, E ] , organized into relatively independent planes, which were

associated with doublets [ h , ~ , ] . The described training set is at one's disposal in the Excel file ma.xls

The proper choice of the discretization steps, AA and AB, had to differ from the procedure described in the previous p m - graphs. Whereas the output quantities of neural models of the FSS and of the transmission line were positive and changed monotonically, the input impedance of a microstrip antenna exhibits a non- monotonic behavior, due to the resonance of the antenna, and the fact that the input reactance is of a bipolar nature. Whereas the dynamic range of the output quantities (the ratio of the lowest output valne and the highest value) of the FSS and the transmission line were relatively low,

fi(7.0,lO.O) 7.9 3 x 1 0 - ~ =-= ~ - f3(7.0,10.0) 30.1

pl(l.0, 1.0) 419.9 3 x 1 0 - ~ p2(5.0, 5.0)- 1394

- _ = - -

the output data of the microstrip antenna had very high dynamic ranges:

mi.{ Rin ( A,B):/Xin ( A , E)/}

= max{Rin (A,B): lXin(A,B) l }

R, (1.00, 0.10) IXin(l.OO, 0.05)i - 1063Q

0.4413n 4 x lo^ --= - -

Considering both the non-monotonic nature and the high dynamic ranges of the approximated quantities, the initial sampling step was set to be very short: AA = 0.25 mm, AB = 12.5 pm. Then, the total training set was created by N , x N B x Nh x N , =

I3 x 5 x 2 x 3 = 390 training pattems, as shown in ma.xls

The relative variation among training pattems was computed for the described discretization. Due to the bipolar nature of the input reactance of the microstrip antenna, the denominator of Equation (7) might have approached zero, and the relative variation was very high, although the output quantity did not change dramatically between the respective sampling points. In order to

59

eliminate this phenomenon, we modified the relations of Equa- tion (7) for the microstrip antenna in the following way:

n = 1,2

where fi = R, is the input resistance and f2 =Xi" is the input reactance of the microstrip antenna. In addition, A denotes the length of the microstrip antenna, B is the width of the microstrip antenna, and i is an index of a respective input parameter in the training set.

In evaluating our training set, the relative variations of the input resistance with respect to the dipole length A were 6R:,? ~ < 1 7 % , 49%>, and the relative variations of the input reac-

tance were within the interval SXiA' ~ < 1 2 % , 197%>. The highest

values of 6RLA) were dominantly located at A E 4 . 0 0 mm, 1.25 mm> for all B. On the other hand, the highest values of SX,'" could be found at A ~ < 2 . 5 0 m m , 2.75mm> and at A E <3.00 mm, 3.25 mm> for all B.

Dealing with the relative variations with respect to the dipole width, B, the variations of the input resistance, SR;,? e<0.01%, 9.5%>, were in all cases lower than 10%. The variations in the input reactance, SA';:' ~ < 0 . 0 2 % , 79.69/d, exceeded the limit of 10% dominantly for A = 2.50 mm and A = 3.00 mm for all B.

Considering the location of the highest relative variations, a high approximation error for the input resistance could be expected for A < 1.50 mm, and a high error for the input reactance could be supposed at A E 4 5 0 mm, 3.50 mm>.

In order to verify the validity of our expectations, a neural model of the microstrip antenna, consisting of 17-7-17 hidden neurons, was developed (ma-lm-17-7-17a .mat). The ANN was trained within 133 iteration cycles with the training error being lower than lo-', The neural model exhibited a cumulative error of the input resistance,

cR ( A , B )

[%I (9a) 100 I $ ( A , B 2 h m , % - R r n ( A > B 2 w r , " ) l

Rin (4 B, hm A,") =-EX

6 m = l n = I

as depicted in Figure 15a. The cumulative error of the input reactance

cx ( A J )

was as depicted in Figure 15B. In Equation (9), A is the length of the microstrip dipole, B is its width, h = (1.0 mm, I S m m ] denotes the height of the substrate, and E , = [1.0, 1.6, 2.01 is the dielectric

60

constant of the substrate. In addition, R,,, is the input resistance of the microstrip antenna provided by the neural model, and Rin is the same quantity provided by the numerical analysis. Similarly, ,?in and X i , denote the input reactances of the microstrip antenna.

40

20

0

5 [mm] 0.050' 1

Figure 15a. The cumulative error in the input resistance of the neural model of a microstrip antenna. The ANN consisted of 17-7-17 neurons, and the desired training error is 10-7.

A I

5 [mm] 0.050 ' 1

Figure 15h. The cumulative error in the input reactance of the neural model of a microstrip antenna. The ANN con!iisted of 17-7-17 neurons, and the desired training error is lW7.

I O 1

B [mm] 0.050 ' 1 Figure 15c:The cumulative error in the input reactance of the neural model of a microstrip antenna, when the e r ror maximum is eliminated. The ANN consisted of 11-1-11 neurons, and the desired training error is 10".

IEEE Antenna's and Propagation Magazine, Vol. 44, No. 6, Decem?erZOOZ

Figure 15a confirms our hypothesis that the highest error of the neural modeling of the input resistance would he located at A c 1.50 mm. Dealing with input reactance, the relative error of over 150% at the position [3.125 mm, 0.075 mm] makes the rest of the figure unreadable. Suppressing this error, Figure 15c was obtained. In this figure, the relative error of the input reactance reached up to approximately 20% for A E <2.50 mm, 3.50 mm>. Therefore, our hypothesis can he considered as being confirmed.

If it is required to reduce the very high approximation error of the neural model of the microstrip antenna, the discretization steps, M I AB, have to be shortened in regions where high reia- live variations were revealed. Unfortunately, the refinement of the training set significantly increases the number of training pattems (i.e., the number of numerical analyses that have to he performed), and, consequently, the ANN has to contain more neurons (i.e., the tmining process consumes more CPU time). Therefore, instead o f refining the training set, we tried to approach the problem of modeling the microstrip antenna in a manner similar to that used when modeling the FSS and the transmission line, which provided satisfactory results with minimal effort.

In the first step, we decreased the dynamic ranges of the output pattems by applying the natural logarithm to all of the output set. Since the input reactance of the microstrip antenna might be negative, we added a constant to every reactance, in order to get positive numbers higher than one. If even every input resistance of the microstrip antenna was increased in order to be higher than one, then all the logarithms were positive. In our case,

xi. ( A , B ) = In[Xin ( A , B ) + 10651, ( 1 Ob)

where R, and X, are the input resistance and the input reactance of the microstrip antenna from the original training set, respectively. The symbols qn and xin denote the input resistance and the input reactance of the microstrip antenna from the transformed data set. The constant 1065 was derived from the minimum value of the input reactance, min{Xin) = -10630.

The transformed data set contained positive numbers only, and, therefore, a unipolar sigmoid could he used in the output layer of the ANN. Moreover, the dynamic ranges of the original data set were reduced from the value pMA = 4 x IO" to

qn(l.OO,O.lO) 0.366 5x10-2

xin(4.00,0.10)- 7.38 - _ = - -

Further, we have to investigate the relative variations in the transformed training set (see ma.xis). The variations of the input resistance with respect to the antenna length, S d A " , exceeded 30% for the smaii value ofA, and it was ahout 10% for the high value ofA. The variations of the input resistance with respect to the antenna width, Sq!:', were lower than 2% in all cases. The variations of

the input reactance, Sxl(nA), and 8$), were lower than IO%, except for a few singular cases for At<1.00 mm, 1.50 mm> and

IEEE Antenna's and Propagation Magazine, Vol. 44 No. 6 , December 2002

for B E <0.050 mm, 0.075 mm>. Therefore, the neural model of the microstrip antenna might have been expected to exhibit the highest approximation error for small values o f A and B.

In the first step of verifying the above hypothesis, the proper structure of the hidden layers was estimated to be 17-8-17 neurons. The Bayesian regularization told us that 87% of the parameters of the ANN were efficiently used (500 steps, a training error lower than The Bayesian regularization produced a neural model of the microstrip antenna that is stored in ma-br-17-8-17a . m a t . The approximation error of this model is depicted in Figure 16. As shown, the highest approximation error (0.8% for qn, 1.3% for xi , ) was really associated with the smallest values of A and B, as we had predicted.

In the second step, the same ANN was trained using the Levenherg-Marquardt algorithm. Approximation oscillations were observed when testing results of the training. Therefore, the num- her of neurons in hidden layers was consecutively reduced to 16-6- 16 (further reduction increased the approximation error). Within 260 iteration steps, the neural model of the microstrip antenna (ma-lm-16-6-16b.mat) was trained with an error lower than IO-'. The approximation error was lower than 3% for both Q~ and

x,n

. . 5 [mm] 0.050 1

Figure 16a. The cumulative error of the input resistance of the neural model of a microstrip antenna, exploiting the iogarith- mic transform. The ANN consisted of 17-8-17 neurons, the desired training error was 104, and Bayesian regularization was applied.

5 [mm] 0.050 1 Figure 16b. The cumulative error of the input reactance of the neural model of a microstrip antenna, exploiting the logarithmic transform. The ANN consisted of 17-8-17 neurons, the desired training error was 104, and Bayesian regularization was applied.

61

In this case, Bayesian training provided better results than did the Levenberg-Marquardt algorithm. The higher CPU-time demands of the Bayesian training was the price we had to pay for a more-accurate neural model of the microstrip antenna.

Finally, we had to investigate the transformation of the approximation error (the natural logarithm of the input resistance and the reactance) to the deviation of the obtained input impedance from the numerical model. The highest approximation error was located in the area where A and B were small. In that region, q,, < 1.15, which converted the approximation errorof0.8% to

[ e x p ( ~ ,008 x 1.15)- I] -[exp(0.99~ x I .IS) - I] [exp(0.992 x I . I 5 ) - 13

6, = 100

5 3 % .

Similarly, in the region of the highest error, where xi" < 6.77 and the approximation error was a maximum of 1.3%, the deviation of the input reactance of the microstrip antenna could reach the value

[exp(l.O13 x 6.74) - 10651 -[exp(0.987 x 6.74) - 10651

[exp(0.987 x 6.74) - 10651 6.=100

5 50%.

Therefore, the region of the highest error had to be re-sampled, and a new ANN had to be trained.

In the rest of the training space, the approximation error was lower than 0.2%, which caused the highest error of qn, xi" to he lower than 5%.

In summarizing our experience with building a neural model of the microstrip antenna, the following conclusions can be drawn:

* If the ANN is asked to approximate non-monotonic bipolar quantities, then the discretization step has to be relatively short, Attention has to be carefully paid to the relative variations of the approximated quantities (computed according to the modified relations, in the case of bipolar output values), and to the dynamic range of the approximated quantities.

* If the approximated quantities exhibit very high dynamic ranges (more than in our case), then the dynamic ranges have to be properly reduced. As shown in our paper, exploring the natural logarithm for this purpose is not the best solution (the error of 5% is much higher than in the case of the neural models of the FSS and transmission line).

* If the approximated quantities exhibit high relative variations, then the discretization has to be refined in the respective area. The refinement is performed in the same way as described in Sec- tions 3.1 and3.2.

- If even the optimal architecture of an ANN does not perform with satisfactory results when trained by the Levenberg-Marquardt algorithm, then Bayesian regularization can be used to achieve better results.

Now, we are familiar with the techniques used for building neural models of electromagnetic systems. In the next section, we are going to discuss the CPU-time demands of building neural models in depth, so that we can determine whether neural modeling provides more advantages or disadvantages.

62

3.4 CPU-Time Demands of Neural Modeling

In this section, we utilize our experience with developing neural models of the FSS, transmission line, and .nicrostrip antenna in order to evaluate the CPU-time demands of this development. The CPU-time demands consist of the time necessary for building training patterns, and of the time used for training an ANN.

The time demands of the numerical modeling of our structures are summarized in Table 9. The total time used for computing a single training pattern was obtained by multiplyin]: the time of a single analysis by the number of its executions. A ringle pattem of the FSS required 19 executions (on average), bc:cause the maximum of the reflection coefficient and 3-dB decrease had to be found numerically. A single pattern of the transmission line needed four executions because the structure was analyzed at two frequencies, for both the electric intensity and the magnetic imensity (in order to minimize the error of the analysis [32]). A sin&:le pattern of the microstrip antenna was equivalent to the single analysis.

The CPU-time demands of training are given in Table 10. In this table, we concentrated on those neural models that were elected as optimum in the previous paragraphs.

A neural model of the FSS (fss-lm-xe.mat) vias developed, using a training set that consisted of 23 pattems (the initial training set of 15 patterns was completed by an additional eight

Table 9. The CPU-time demands of the preparation af a single training pattern. Analysis: the time of a single numerical

analysis of the structure; repetitions: the number of single analysis repetitions needed for completing the training pattern; total: the time necessary for building a single (raining

pattern.

Transmission Line

Microstrip Antenna 16.6 16.6

Table 10. The total CPU time needed for developing a neural model of the FSS (5-3-5 neurons, 530 epochs), the transmission line (4-4 neurons, 872 epochs), and the microstrip antcsnna (16-

6-16 neurons and 260 epochs for the Levenherg-Marquardt (LM) training, 17-8-17 neurons and 500 epochs far the

Bayesian (BR) training). Patterns: the total time needed for computing the whole training set; train: the total duration of

the training process; total: the sum of patterns and train.

Transmission Line 19 x 22.4

IEEE Antenna's and Propagation Magazine, Vol. 44, NO. 6, December 2002

pattems, in order to reduce very high relative variations). Since the numerical computation of a single pattem took approximately 469 seconds, the building of the whole training set was finished within 180 minutes. Due to the small size of the respective ANN (5-3-5 hidden neurons), the training process was over within one minute, using the Levenherg-Marquardt algorithm.

cE = I

A neural model of the transmission line (tl-lm-44b. mat) was based on a training set of 19 patterns (the initial training set, consisting of nine patterns, was completed by eight patterns in the first step and by an additional two pattems in the second step, because the first step did not satisfactorily reduce the relative variations). The numerical computation of a single pattem was completed within approximately 22.4 seconds, and therefore the whole training set was prepared within seven minutes. The size of the respective ANN was again very small, and therefore the training took only one minute when the Levenberg-Marquardt algorithm was used.

cE = 5 Analyses

Due to the non-monotonic nature of the input impedance, a neural model of the microstrip antenna had to be trained on a training set consisting of 390 pattems. Since the numerical analysis of a single pattem took 16.6 seconds, the whole training set was built within 108 minutes. A huge amount of the training data cor- responded to the relatively large size of the respective ANN (16-6- 16 hidden neurons for the Levenberg-Marquardt training, ma-lm-16-6-16b.mat; 17-8-17 hidden neurons for the Bayesian training, ma-br-17-8-17a .mat). Therefore, the training time was much longer (30 minutes and 75 minutes, respectively) than in the previous cases.

Transmission Line Microstrip Antenna-LM Microstrio Antenna-BR

Unfortunately, the time needed for the development of neural models does not consist only of the CPU time. In addition, we have to consider the time used for refining the training set, for testing approximation error, for optimizing the architecture of an ANN, etc. Moreover, training is performed on a multi-start basis, because random starting values of weights and biases lead to different results for the same training pattems and neural networks.

8 40 110

138 690 2500 183 915 3300

In our validation, we respected the above-described additional time requirements by multiplying the CPU time from Table 10 by a coefficient c, . The value of the coefficient strongly depends on the experience of the person developing a neural model, and on good fortune (sometimes, a very good model is obtained from the first training; other times, we have to repeat training many times to get a good approximation). In Table 11, we selected cE = 5 , and computed the number of numerical analyses where the CPU-time demands were equivalent to the time needed for building a neural model.

The good efficiency of the development of the neural model of the FSS was caused by the fact that the numerical analysis was relatively time consuming with respect to the microstrip antenna (469 seconds versus 16.6 seconds). Moreover, an accurate neural model of the FSS was based on only 23 training pattems (versus 390 pattems for the microstrip antenna), and the training was finished within one minute (versus 30 minutes or 75 minutes for the microstrip antenna: the first value corresponds to the Levenherg- Marquardt training, and the second value is for Bayesian regularization).

The development of the neural model of the transmission line was efficient, too. Although the CPU time of a single numerical analysis of the transmission line was comparable to that of the microstrip antenna (22.4 seconds versus 16.6 seconds), even fewer

Table 11. The number of numerical analyses equivalent to the CPU-time demands of building a neural model.

~

IEEE Antenna's and Propagation Magazine, Vol. 44, No. 6 , December 2002 63

pattems than for the FSS were needed (19 versus 23), and the training took the same time (1 minute).

Obviously, the low efficiency of building a neural model of the microstrip antenna was caused by a very extensive training set, which was necessary due to the non-monotonic nature of the approximated quantities, and consequently a training process that consumed a lot of CPU time. Moreover, these long time periods are compared with the very short duration of a single numerical analysis.

Nevertheless, the final answer - whether building a neural model makes any sense or not ~ can only give us an optimization result. If an optimization of a given structure can be completed within a lower number of steps than required for building a neural model, then neural modeling does not make any sense, and vice versa.

In the following Section, we use genetic algorithms in conjunction with neural models in order to optimize the FSS, transmission line, and microstrip antenna. Then, we compute the number of numerical analyses needed, and we compare these values with the content of Table 11.

4. Neural Design

In this section, we are going to utilize neural models of the FSS, transmission line, and microstrip antenna, in conjunction with a genetic algorithm, in order to reveal regions that are suspected of containing the global minimum of a cost function. The localities revealed can he efficiently examined using numerical models and local optimization techniques. The same genetic algorithm is used for all three structures of interest. Population decimation is exploited as a selection strategy. Every generation consisted of 20 individuals, the probability of cross-over was set to 90%, and the probability of mutation was equal to 10%. Continuous parameters were binary encoded using eight hits. Genetic optimization was stopped when a prescribed value of the cost function was reached (a successful realization), or when 500 iteration steps had passed (an unsuccessful realization). The optimization of every structure was performed over five successful realizations (unsuccessful realizations were not considered).

The FSS was optimized using the neural model f ss-lm-xe. mat. During optimization, the width of the conductive element, b, and the width of the discretization cell, B , was searched so that the modulus of the reflection coefficient of the Floquet mode (0,O) was maximum at a frequency of fi = 12.0 GHz, and so that its 3-dB decrease appeared at frequencies of fi = 9.0 GHz, and f3 = 15.0 GHz. The optimization was

Table 12. The genetic optimization of an FSS using the neural model. The desired frequency properties of the modulus of the

reflection coefiicient:fi = 9.0GHz, f2 = 12.0GHz, and f3 = 15.0GHz. The stopping value of the cost function:

efis < 0.050GHz2.

Table 13. The genetic optimization of a transmission line using the neural model. The desired phase constant a t 20 GHz was Dl = 800m-1, and a t 30 GHz it was p2 = 1200 m-I. The stop-

ping value of the cost function was eTL < 25 m-z.

stopped when the value of the cost function was lower than e,,, < 0.050 GHz2. Successful realizations of the optimization process are listed in Table 12. The results indicated a potential global minimum of the cost function in the region b ~ < 3 . 2 7 mm, 4.04 mm>, and B ~ < 1 6 . 3 8 mm, 16.65 mm>. A numerical analysis of the FSS with optimal widths bop, and Bopt (the columns labeled

“numeric analysis” of Table 12) confirmed the vicinity of the desired frequency properties.

The average number of iteration steps of the genetic optimization was approximately equal to eight generations (see the “cost” column of Table 12). Since every generation consisted of 20 individuals, 160 triplets [fi,,f2,f3] had to be computed. In comparison, the CPU-time demands of the development of a neural model were equivalent to computing 120 triplets (Table 11). We can therefore conclude that the neural model of the FSS replaced the numerical model in an effective manner.

The transmission line was optimized using the neural model tl-lm-44b .mat. The optimization was aimed to estimate the dielectric constant of the substrate, zr , , and of the dielectric cover

layer, zr2, so that the phase constant of the dominant mode was PI =800m-’ at 20GHz, and was equal to P2 =1200m-l at 30 GHz. The optimization was stopped when the value of the cost function was lower than eTL < 2 5 ~ ’ . Successful realizations of

64

the optimization process are listed in Table 13. The results showed that a potential global minimum of the cost function could be located in the regions eYl ~(3 .50 ,3 .77 ) , and e,, ~(3 .52 ,4 .44) . A numerical analysis of the transmission line with optimum dielectric constants, and E,,,~,, (the “numeric analysis” col;m”n of Table 13) confirmed the vicinity of the desired dispersion characteristics.

The average number of iteration steps of the genetic optimization was approximately equal to 15 generations (see the “cost” column of Table 13). Since every generation consisled of 20 individuals, 300 doublets had to be computed. By comparison, the CPU-time demands of the development of a neur;il model were equivalent to computing 110 doublets (Table 11). We can therefore conclude that the neural model of the transmis!;ion line again replaced the numerical model in an effective manner,.

The microstrip antenna was optimized using the logarithmic neural model ma-br-17-8-17a . m a t . The optimization procedure was asked to estimate the length of the dipole, A, the width of the dipole, B, the dielectric constant of the substrate, E , and the height of the substrate, h, so that the input impedance at 30 GHz was equal to Zin = ( 2 5 + j0)n. In the genetic optimization, the

dielectric constant was binary coded using two bits (three possible values), and the height of the substrate was binary coded using one bit (two possible values). The optimization was stopped when the value of the cost function was lower than e,, <G.001Q2. Successful realizations of the optimization process are listed in Table 14. The results show that a potential global minimum of the cost function could be located in the region A ~ < 3 . 1 7 m m , 3.20mm>,B~<0.055mm,0.093mm>,&=1.6,andh=1.5mm. A numerical analysis of the microstrip antenna with optimum parameters (the “numeric analysis” columns of Table 14) showed that the input resistance was very close to the desired value, and the input reactance deviated from the desired value for 20 Q. Nev- ertheless, the results can be considered sufficiently close to the optimum so that the local optimization routine could be applied.

The average number of iteration steps of the genetic optimization was equal to approximately five generations; (see the “cost” column of Table 14). Since every generation consisted of 20 individuals, 100 quadruples [A , B, e, h ] had to be computed. In

Table 14. The genetic optimization of a microstrip antenna using the neural model. The desired input impedan<:e a t

30 GHz was R,, = 25 i2 and XI, = 0 i2. The stopping value of

the cost function was e,, 5 0.001 RZ.

+2 1.4

116.7

IEEE Antenna’s and Propagation Magazine, Vol. 4 4 , No, 6. Decemiier 2002

comparison, the CPU-time demands of the development of a neural model were equivalent to computing 3300 quadruplets (Table 11). We can therefore conclude that building the neural model of the microstrip antenna did not make any sense in our situation.

Finally, our experience can be summarized in the following items:

* Neural models, which are intended to approximate monotonic quantities, can be developed efficiently (a small number of training pattems, rapid training, good accuracy), and they can therefore successfully replace numeric models of EM structures.

If neural models are asked to approximate quantities of a non- monotonic, oscillatory-like nature, then their development is rather laborious, and building them for a single use is inefficient. On the other hand, these models could be included with CAD tools instead of approximate mathematical models, and then their development makes sense.

In the last section, we are going to briefly summarize all the conclusions from the paper and to end by making a few comments on the topic.

5. Conclusions

In this paper we have discussed the exploitation of artificial neural networks in relation to the modeling of EM structures. We used an ANN in the role of a transformer that statically maps the physical parameters of modeled structures (dimensions, perminiv- ity, permeability, etc.) to their technical parameters (reflection coefficient, phase constant, input impedance). Feed-fonuard neural networks can he used for this purpose.

An ANN can consist of McCulloch-Pitts neurons (when trained by genetic algorithms), or it can contain neurons called ADALINE (hack-propagation networks completed by local training routines). We use a tangential sigmoid (hidden layers, bipolar output quantities) or a logarithmic sigmoid (unipolar output quantities) as an activation function.

In our situation, the best results were obtained using profes- sionally programmed ANNs from the Neural Network Toolbox of MATLAB. We exploited local training routines by reordering training pattems and using multi-starting, in order to avoid convergence at the local minimum of the error surface, as training algorithms. A proper training set has to be prepared in the first step of the development of a neural model. We equidistantly sampled the input space (the physical parameters) with a relatively long sampling step in order to accomplish this aim. We performed a numerical analysis (or measurement) in order to obtain corresponding output responses (the technical parameters) for all the samples of input parameters. The initial set was built in this way.

The training set had to be refined in order to achieve a good accuracy of the neural model. Computing relative variations of the output responses, we added new patterns to the initial training set, in order to reduce the relative variations below the prescribed level (e& 10%). If the approximated quantities exhibited very high dynamic ranges, then the dynamic ranges had to he properly reduced (a suitable transform had to be used). As shown in our paper, exploring the natural logarithm for this purpose is not the best solution.

A proper architecture of the ANN had to he estimated in the second step. The number of hidden layers and the number of their neurons were guessed according to the number of training patterns. Then, Bayesian regularization was used in order to estimate the number of efficiently used parzmeters, the percentage of which should be from 70% to 90%. If the number of efficiently used parameters was not within this interval, the architecture had to be modified.

The ANN of the proposed architecture had to he trained using the Levenherg-Marquardt algorithm in the third step. The training should he finished within a reasonable number of iteration steps (200 to lOOO), and with a sufficiently low training error (from 10-5 to IO-’). Since the training error describes the deviation between the numeric and neural models at the sampling points, the quality of the neural model had to he tested for even the point lying between sampling points (e&, a certain. number of randomly located samples was generated, and the error was evaluated for those samples). If significant deviations were revealed, then the ANN had to he re-trained with a lower number of neurons, with a bottleneck introduced, or with the Levenberg-Marquardt procedure being replaced by Bayesian regularization (over-training was pre- vented).

An accurate model of an EM structure can he efficiently built by performing the above-described steps,.

6. Notes

The MATLAB code and the Excel files described in the paper can he downloaded free of charge via the Web site http://www.feec.vuthr.cz/-raida

The investigation of the efficiency of the MATLAB code developed was performed using the profiler tools of MATLAB 5.3, running on a Digital Alpha 433au workstation, under the Digital UNIX 4 . 0 ~ operating system.

t

7. Acknowledgments

The research published in this paper was financed by grants No. 10210110571 and No. 10210110573 ofthe Czech Grant Agency, and by the research program MSM 262200011, “Research of Electronic Communication Systems and Technologies.” The author thanks Prof. DuSan Cemohonky for his help in improving the manuscript of the paper, and for fruitful discussions concerning topics described here. The author is grateful to Dr. Kenneth Froehling for proofreading the manuscript.

8. References

1. S . Haykin, Neural Nehvorks: A Comprehensive Foundation, Englewood Cliffs, New Jersey, Macmillan Publishing Company, 1994.

2. A. Cichocki and R. Unbehauen, Neurul Networks for Optimiza- tion and Signal Processing, Chichester, England, J. Wiley & Sons, 1994.

IEEEAnfenna’s and Propagation Magazine, Val. 44, No. 6. December 2002 65

http://www.feec.vuthr.cz/-raida

3. P. R. Chang, W. H. Yang, and K. K. Chan, “A Neural Network Approach to MVDR Beamforming Problem,” IEEE Transactions on Antennas and Propagation, AP-40, 3, March 1992, pp. 313- 322.

4. R. J. Mailloux and H. L. Southall, “The Analogy Between the Butler Matrix and the Neural-Network Direction-Finding Array,” IEEE Anfennas and Propagafion Magazine, 39, 6, December 1997, pp. 27-32.

5 . S. Sagiroglu and K. Guney, “Calculation of Resonant Frequency for an Equilateral Triangular Microstrip Antenna with the Use of Artificial Neural Networks,” Microwave and Optical Technology Leffers, 14, 2, February 1997, pp. 89-93.

6 . R. K. Mishra and A. Patniak, “Neurospectral Computation for Complex Resonant Frequency of Microstrip Resonators,” IEEE Microwave and Guided Wave Letters, 9, 9, September 1999, pp. 351-353.

7. J. W. Bandler, M. A. Ismail, J. E. Rayas-Sinchez, and Q.4. Zhang, ‘TJeuromodeling of Microwave Circuits Exploiting Space- Mapping Technology,” IEEE Transactions on Microwave Theory and Techniques, MTT-47, 12, December 1999, pp. 2417-2427.

8. S. Wang, F. Wang, V. K. Devabhaktuni, and Q.-J. Zhang, “A Hybrid Neural and Circuit-Based Model Structure for Microwave Modeling,” Proceedings of the 29“ European Microwave Confer- ence, Munich (Germany), 1999, pp. 174-177.

9. S. Wu, M. Vai, and S. Prasad, “Reverse Modeling ofMicrowave Devices Using a Neural Network Approach,” Proceedings o f the Asia Pacific Microwave Conference, New Delhi (India), 1996, pp. 951-954.

10. A. Patniak, G. K. Patro, R. K. Mishra, and S. K. Dash, “Effec- tive Dielectric Constant of Microstrip Line Using Neural Net- work,” Proceedings of the Asia Pacific Microwave Conference, New Delhi (India), 1996, pp. 955-957.

11. K. C. Gupta and P. M. Watson, “Applications of ANN Com- puting to Microwave Design,” Proceedings of the Asia Pacific Microwave Conference, New Delhi (India), 1996, pp. 825-828.

12. G. Antonini and A. Orlandi, “Gradient Evaluation for Neural- Networks-Based Electromagnetic Optimization Procedures,” IEEE Transactions on Microwave Theory and Techniques, MTT-48, 5 , May 2000, pp. 874-786.

13. C. Christodoulou and M. Georgiopoulos, Applications of Neu- ral Networks in Electromagnetics, Norwood, Massachusetts, Artech House, 2000.

14. Q. I. Zhang and K. C. Gupta, Neural Nemorks for RF and Microwave Design, Norwood, Massachusetts, Artech House, 2000.

15. H. Demuth and M. Beale, Neural Network Toolbox for Use with Matlab: User’s Guide (version 4), Natick, The MathWorks Inc., 2000.

16. R. L. Haupt, “An Introduction to Genetic Algorithms for Elec- tromagnetics,” IEEE Antennas and Propagation Magazine, 37, 2, April 1995, pp. 7-15,

17. D. S. Weile and E. Michielssen, “Genetic Algorithm Optimi- zation Applied to Electromagnetics: A Review,” IEEE Transac-

66

tions on Antennas and Propagation, AP-45, 3, March 1997, pp. 343-353.

18. J. M. Johnson and Y . Rahmat-Samii, ‘‘ Genetic Algcirithms in Engineering Electromagnetics,” IEEE Antennas and Propagation Magazine, 39, 4, August 1997, pp. 7-21.

19. Z. Altman, R. Mittra, and A. Boag, “New Designs of Ultra Wide-Band Communication Antennas Using a Gene::ic Algo- rithm,” IEEE Transactions on Antennas and Propagatio.r, AP-45, 10, October 1997, pp. 1494-1501.

20. E. A. Jones and W. T. loines, “Design of Yagi-Uda Antennas Using Genetic Algorithms,” IEEE Transactions on Antennas and Propagafion, AP-45,9, September 1997, pp. 1368-1392.

21. K.-K. Yan and Y. Lu, “Sidelobe Reduction in Array-Pattern Synthesis Using Genetic Algorithm,” IEEE Transactions on Antennas andPropagation, AP-45,7, July 1997, pp. 111;’-1121.

22. M. Gen and R. Cheng, Genetic Algorithms & Engineering Design, Chichester, England, John Wiley & Sons, 1997.

23. R. L. Haupt and S. E. Haupt, Practical Genetic Alrorithms, Chichester, England, John Wiley & Sons, 1998.

24. C. Scott, Spectral Domain Method in Electronragnetics, Norwood, Massachusetts, Artech House, 1989.

25. J. Hertz, A. Krogh, and R. G. Palmer, Introduction to the The- ory of Neural Computation, Reading, Pennsylvania, Addison Wesley, 1991.

26. S. Haykin, Adaptive Filter Theory, Second Edition, Englewood Cliffs, New Jersey, Prentice-Hall, 1991.

27. I. R. Mosig and F. E. Gardiol, “A Dynamical Radiation Model for Microstrip Structures,” in P. Hawkes (ed.), Advancer in Elec- ironics and E/ecfron Physicr, New York, Academic Press, 1982, pp. 139-237.

28. J. R. Mosig and F. E. Gardiol, “Analytical and >lumerical Techniques in the Green’s Function Treatment of Microstrip Antennas and Scatterers,” IEE Proceedings H, 130, 2, February 1982, pp. 172-182.

29. I. R. Mosig and F. E. Gardiol, “General Integral Equation Formulation for Microstrip Antennas and Scatterers,” 1EE Pro- ceedings H, 132, 7, July 1985, pp. 424-432.

30. J. F. Lee, D. K. Sun and Z. I. Cendes, “Full-Wave Analysis of Dielectric Waveguides Using Tangential Vector Finite-Elements,” IEEE Transactions on Microwave Theory and Techniques, MTT- 39, 8, August 1991, pp. 669-678.

31. I. F. Lee, “Finite Element Analysis of Lossy 1)ielectric Waveguides,” IEEE Transactions on Microwave Theory and Techniques, MTT-42,6, June 1994, pp. 1025-1031.

32. Z. Raida, Full- Wave Finite-Element Analysis of M;crowave Transmission Lines, Habilitation Thesis, Bmo, Bmo University of Technology, 1998.

33. D. Cemohorsk9, Z. Raida, Z. Skvor, AND Z. .Vovitek, Analjza a optimalizace mikrovlnnjch struktur (Analysis and Optimization of Microwave Structures), Bmo, VUTIUM Publish- ing, 1999 (in Czech).

IEEEAntenna’s and Propagation Magazine, Vol. 44, No. 6, December 2002

Introducing the Feature Article Author

Zbynlk Raida received the Ing. (MS) and Dr. (PhD) degrees from the Bmo University of Technology (BUT), the Czech Republic, in 1991 and 1994, respectively. From 1993 to 1998, he was an Assistant Professor in the Department of Radio Electronics at BUT, and since 1999 he has been an Associate Professor in the same department. From 1996 to 1997, he spent six months at the Laboratoire de Hyperfrequences, Universite Catholique de Louvain, Belgium, as an independent researcher. He is author or coauthor of more than 50 papers in scientific joumals and conference proceedings in the areas of numerical modeling of EM sWuc- tures, optimization of EM structures, application of neural networks to EM modeling and design, and adaptive antennas. In 1999, he received the Young Scientist Award to attend the URSI General Assembly in Toronto, Canada.

Dr. Raida is a member of the IEEE Microwave Theory and Techniques Society, and, since 2001, he has served as a Chair of the MTTiAPiED joint Chapter of the Czech-Slovak Section of the IEEE. Since 2001, he has been an Executive Editor of the joumal Radioengineering (a publication of Czech and Slovak Technical Universities and URSI committees).

Editor's Comments Contiwedfrom page 8 instant in time. It is thus necessary to measure the difference between the actual position of the spacecraft and the position to which the antenna is pointing. In order to make such measurements, scanning movements are added to the motion of the antenna. In their article, Wodek Gawronski and Emily Craparo describe three types of scanning that can be used for this. They explain how each method works, and compare the accuracy with which each is capable of estimating the true spacecraft position. They also introduce a new method for implementing such scans that can significantly reduce the time required for estimating the spacecraft's position.

An artificial neural network (ANN) is a computational system that can be trained to remember and reproduce the response of a modeled system for a given set of parameters. The ANN can then he used to predict the response of the system to variations in those parameters. In his feature article, Zbynek Raida has provided us with a true tutorial on ANNs and bow to implement them, using MATLAB's Neural Network Toolbox. He first explains artificial neural networks and some of the considerations in modeling them, He then shows how they can be modeled using the MATLAB tools. He then applies what he has introduced to the modeling of a frequency-selective surface, a microstrip transmission line, and a microstrip dipole antenna. This really is a tutorial: it provides a detailed, step-by-step introduction to the subject, and step-by-step illustrations via the applications. The MATLAB code and the data files used in the article are available free of charge from the

IEEEAnlenna's and Propagation Magazine, VoI. 44, No. 6, December 2002

author's Web site. Even if you don't get the code and actually carry out all of the steps in the tutorial, you should read the article. It provides a very nice introduction to ANNs, and some illustrative applications.

Over the past few years, we have had a number of columns and feature articles investigating the basics of how antennas radiate and interact. Much of this was started by questions considered by Ed Miller in his column. The article in this issue by Sadaki Maeda and Paul Diament continues this type of investigation. They look at two adjacent, out-of-phase electric dipoles. Specifically, they consider the power flow around the dipoles, as evidenced by the Poynting-vector field. They show that there is a boundary around the two dipoles. Inside this boundary, the Poynting-vector field flows from one dipole to the other. Outside this boundary, the Poynting-vector field flows outwards from the two dipoles. They are able to show that their result is consistent with Poynting's Theorem. They are also able to show that the total time-averaged power radiated by the two dipoles comes from a very specific, relatively small area. I think you'll find this an interesting contribution to the understanding of how antennas radiate.

Our Other Contributions

It is with tremendous pleasure that I welcome Brian Kent to the Magazine staff as Associate Editor for the AMTA Comer. Brian is the 2003 President of the Antenna Measurement Tech- niques Association (AMTA), and he has been kind enough to vol- unteer to edit a column featuring some of the best papers from the annual AMTA conferences. AP-S and AMTA have sought closer cooperation and better interaction for many years. Hopefully, this will be one way we can achieve that. If you're an AMTA member and you have something you would like to contribute, please get in touch with Brian. The inaugural contribution is by Michael Brenner, Michael Britcliffe, and Bill Imbriale. It provides a fasci- nating look at how gravity-deformation measurements were made on the Deep Space Network antennas.

Open-circuited waveguides are commonly used for a variety of measurements, including being used as probes in near-field antenna ranges. In their contribution to the Antenna Designer's Notebook, edited by Tom Milligan, Fayez Hyjazie and Robert Paknys examine the radiation from large, open-ended rectangular waveguides. It tums out that some of the published expressions for this are incorrect. The authors provide corrected expressions for the radiation from such waveguides.

In the Education Column, edited by Cynthia Furse, Robert Lytle provides a concise introduction to the Numeric Python EM Project. This involves the use of Numeric Python for EM simulation. Python is an interactive, interpreted, object-oriented pro- gramming language. Numeric Python is a version that supports extensions for numeric analysis.

A Java mobile agent is not an espresso delively service. Instead, it is a very clever piece of autonomous computer code that acts as a mechanism for implementing parallel computation across a network. In the EM Programmer's Notebook, edited by John Volakis and David Davidson, Christos Biniaris, Antonis Kostaridis, Dimitra Kaklamani, and Iakovos Venieris describe how Java mobile agents work, and how they can be programmed. They then describe their use in implementing parallel FDTD codes on a network.

Continued onpoge 85

67

Documents

Modeling em structures in the neural network toolbox of ... Network Toolbox of MATLAB ... A block diagram of the process of training a neural ... function at the output (Figure 4c)