Neural network learning with generalized-mean based neuron model

Soft Comput (2006) 10:257–263DOI 10.1007/s00500-005-0479-7

ORIGINAL PAPER

R. N. Yadav · Prem K. Kalra · Joseph John

Neural network learning with generalized-mean basedneuron model

Published online: 27 April 2005© Springer-Verlag 2005

Abstract The advances in biophysics of computations andneurocomputing models have brought the foreground impor-tance of dendritic structure of neuron. These structures areassumed as basic computational units of the neuron, capableof realizing the various mathematical operations. The wellstructured higher order neurons have shown improved com-putational power and generalization ability. However, thesemodels are difficult to train because of a combinatorial explo-sion of higher order terms as the number of inputs to theneuron increases. In this paper we present a neural networkusing new neuron architecture i.e., generalized mean neuron(GMN) model. This neuron model consists of an aggregationfunction which is based on the generalized mean of all theinputs applied to it. The resulting neuron model has the samenumber of parameters with improved computational power asthe existing multilayer perceptron (MLP) model. The capa-bility of this model has been tested on the classification andtime series prediction problems.

Keywords Generalized-mean neuron · Classification ·Function approximation · Multilayer perceptrons

1 Introduction

An artificial neuron is a mathematical model for biologicalneuron that can approximate its functional capabilities. Themajor issue in artificial neuron models is the description ofsingle neuron computation and interaction among the neu-rons with the application of input signals. In literature vari-ous neuron models [1–9,17] and their application for solving

R. N. Yadav (✉) · P. K. Kalra · J. JohnACES-107, Department of Electrical Engineering,Indian Institute of TechnologyKanpur, IndiaE-mail: [email protected].: +91-512-2597007Fax: +91-512-2590063

R. N. YadavDepartment of Electronics and Communication Engineering,Maulana Azad National Institute of TechnologyBhopal, India

linear and nonlinear problems have been presented.The func-tion of neurons has been clearly explained by the author in[15]. The McCulloch-Pitts [1] neuron model initiated theuse of summing units as the neuron model, while neglectingall possible nonlinear capabilities of single neuron and therole of dendrites in information processing in neural system.However, this model makes the use of several drastic sim-plification. It allows binary 0,1 states only, operates under adiscrete-time assumption and assumes synchrony of opera-tion of all neurons in a larger network. This neuron modelhas an aggregation function which, in a sense, is a weightedmean of all the inputs applied to it.We propose a simple modelfor the generalized mean neuron (GMN) with a well-definedtraining procedure based on standard back-propagation. Theproposed GMN considers a weighted generalized mean ofall inputs in the space. This ensures that representation ofMcCulloch-Pitts model is a special case of proposed neuronmodel.

In Section 2 we discuss the motivation, physical architec-ture and mathematical representation of the proposed neuron.Section 3 presents the architecture and learning rule for amultilayer feedforward neural network based on GMN. Sec-tion 4 discusses the performance of the neural network usingproposed neuron model on two typical pattern recognitionproblems – classification and function approximation. Wesolve the channel equalization, Pima Indians diabetes andsynthetic two-class problems using GMN based network andcompare it with a multilayer perceptron (MLPs) based net-work which takes more parameters and longer training time.Similarly, Mackey-Glass, Box-Jenkins gas furnace and HCLinternet incoming traffic data sets are used to demonstratefunction approximation capabilities of the proposed neuronmodel. The final conclusion is presented in Section 5.

2 Generalized mean based neuron model

Neuron modeling concerns with relating function to the struc-ture of neuron on the basis of its operation. As the name sug-gests the proposed neuron model is based on the concepts ofgeneralized mean [18] of the input signals. The generalized

Used Distiller 5.0.x Job Options

This report was created automatically with help of the Adobe Acrobat Distiller addition "Distiller Secrets v1.0.5" from IMPRESSED GmbH. You can download this startup file for Distiller versions 4.0.5 and 5.0.x for free from http://www.impressed.de. GENERAL ---------------------------------------- File Options: Compatibility: PDF 1.2 Optimize For Fast Web View: Yes Embed Thumbnails: Yes Auto-Rotate Pages: No Distill From Page: 1 Distill To Page: All Pages Binding: Left Resolution: [ 600 600 ] dpi Paper Size: [ 595 842 ] Point COMPRESSION ---------------------------------------- Color Images: Downsampling: Yes Downsample Type: Bicubic Downsampling Downsample Resolution: 150 dpi Downsampling For Images Above: 225 dpi Compression: Yes Automatic Selection of Compression Type: Yes JPEG Quality: Medium Bits Per Pixel: As Original Bit Grayscale Images: Downsampling: Yes Downsample Type: Bicubic Downsampling Downsample Resolution: 150 dpi Downsampling For Images Above: 225 dpi Compression: Yes Automatic Selection of Compression Type: Yes JPEG Quality: Medium Bits Per Pixel: As Original Bit Monochrome Images: Downsampling: Yes Downsample Type: Bicubic Downsampling Downsample Resolution: 600 dpi Downsampling For Images Above: 900 dpi Compression: Yes Compression Type: CCITT CCITT Group: 4 Anti-Alias To Gray: No Compress Text and Line Art: Yes FONTS ---------------------------------------- Embed All Fonts: Yes Subset Embedded Fonts: No When Embedding Fails: Warn and Continue Embedding: Always Embed: [ ] Never Embed: [ ] COLOR ---------------------------------------- Color Management Policies: Color Conversion Strategy: Convert All Colors to sRGB Intent: Default Working Spaces: Grayscale ICC Profile: RGB ICC Profile: sRGB IEC61966-2.1 CMYK ICC Profile: U.S. Web Coated (SWOP) v2 Device-Dependent Data: Preserve Overprint Settings: Yes Preserve Under Color Removal and Black Generation: Yes Transfer Functions: Apply Preserve Halftone Information: Yes ADVANCED ---------------------------------------- Options: Use Prologue.ps and Epilogue.ps: No Allow PostScript File To Override Job Options: Yes Preserve Level 2 copypage Semantics: Yes Save Portable Job Ticket Inside PDF File: No Illustrator Overprint Mode: Yes Convert Gradients To Smooth Shades: No ASCII Format: No Document Structuring Conventions (DSC): Process DSC Comments: No OTHERS ---------------------------------------- Distiller Core Version: 5000 Use ZIP Compression: Yes Deactivate Optimization: No Image Memory: 524288 Byte Anti-Alias Color Images: No Anti-Alias Grayscale Images: No Convert Images (< 257 Colors) To Indexed Color Space: Yes sRGB ICC Profile: sRGB IEC61966-2.1 END OF REPORT ---------------------------------------- IMPRESSED GmbH Bahrenfelder Chaussee 49 22761 Hamburg, Germany Tel. +49 40 897189-0 Fax +49 40 897189-71 Email: [email protected] Web: www.impressed.de

Adobe Acrobat Distiller 5.0.x Job Option File

<< /ColorSettingsFile () /AntiAliasMonoImages false /CannotEmbedFontPolicy /Warning /ParseDSCComments false /DoThumbnails true /CompressPages true /CalRGBProfile (sRGB IEC61966-2.1) /MaxSubsetPct 100 /EncodeColorImages true /GrayImageFilter /DCTEncode /Optimize true /ParseDSCCommentsForDocInfo false /EmitDSCWarnings false /CalGrayProfile () /NeverEmbed [ ] /GrayImageDownsampleThreshold 1.5 /UsePrologue false /GrayImageDict << /QFactor 0.9 /Blend 1 /HSamples [ 2 1 1 2 ] /VSamples [ 2 1 1 2 ] >> /AutoFilterColorImages true /sRGBProfile (sRGB IEC61966-2.1) /ColorImageDepth -1 /PreserveOverprintSettings true /AutoRotatePages /None /UCRandBGInfo /Preserve /EmbedAllFonts true /CompatibilityLevel 1.2 /StartPage 1 /AntiAliasColorImages false /CreateJobTicket false /ConvertImagesToIndexed true /ColorImageDownsampleType /Bicubic /ColorImageDownsampleThreshold 1.5 /MonoImageDownsampleType /Bicubic /DetectBlends false /GrayImageDownsampleType /Bicubic /PreserveEPSInfo false /GrayACSImageDict << /VSamples [ 2 1 1 2 ] /QFactor 0.76 /Blend 1 /HSamples [ 2 1 1 2 ] /ColorTransform 1 >> /ColorACSImageDict << /VSamples [ 2 1 1 2 ] /QFactor 0.76 /Blend 1 /HSamples [ 2 1 1 2 ] /ColorTransform 1 >> /PreserveCopyPage true /EncodeMonoImages true /ColorConversionStrategy /sRGB /PreserveOPIComments false /AntiAliasGrayImages false /GrayImageDepth -1 /ColorImageResolution 150 /EndPage -1 /AutoPositionEPSFiles false /MonoImageDepth -1 /TransferFunctionInfo /Apply /EncodeGrayImages true /DownsampleGrayImages true /DownsampleMonoImages true /DownsampleColorImages true /MonoImageDownsampleThreshold 1.5 /MonoImageDict << /K -1 >> /Binding /Left /CalCMYKProfile (U.S. Web Coated (SWOP) v2) /MonoImageResolution 600 /AutoFilterGrayImages true /AlwaysEmbed [ ] /ImageMemory 524288 /SubsetFonts false /DefaultRenderingIntent /Default /OPM 1 /MonoImageFilter /CCITTFaxEncode /GrayImageResolution 150 /ColorImageFilter /DCTEncode /PreserveHalftoneInfo true /ColorImageDict << /QFactor 0.9 /Blend 1 /HSamples [ 2 1 1 2 ] /VSamples [ 2 1 1 2 ] >> /ASCII85EncodePages false /LockDistillerParams false >> setdistillerparams << /PageSize [ 576.0 792.0 ] /HWResolution [ 600 600 ] >> setpagedevice

258 R. N. Yadav et al.

Table 1 Forms of the Generalized Mean as the value of r changes

r Mr Operation

∞ Max(xj ) Maximum

2 1N

(∑Nj=1 x2

j

)1/2RMS

1 1N

∑Nj=1 xj Arithmetic mean

0(∏N

j=1 xj

)1/N

Geometric mean

−1 1N

(∑Nj=1

1xj

)−1Harmonic mean

−∞ Min(xj ) Minimum

mean (GM) of N input signals xj (j = 1, 2, . . ., N, N ∈ I)can be given as

GM = 1

N

N∑j=1

xrj

1/r

(1)

where r(r ∈ R) is a generalization parameter which givesvarious means (arithmetic mean, geometric mean and har-monic mean) depending upon its values. It also gives theMax and Min operators when the value of r is maximum(+∞) and minimum (−∞) respectively. Table 1 shows thedifferent operations attained by the generalized mean opera-tor as the value of r changes. Inspired by the importance offlexibility of the above equation, aggregation function of theGMN can be defined as

y(xj , wj ) =

N∑j=1

wjxrj + w0

1/r

(2)

where wj is the adaptive parameter corresponding to each xj

and w0 is threshold of the neuron. From equation (2) we findthat

y(xj , wj ) =

N∑j=1

wjxj + w0

for r = 1 (3)

which is the output of the McCulloch-Pitts model. Thus theperceptron model is a special case of proposed generalizedmean based neuron model.The physical architecture of theproposed neuron model is same as that of perceptronmodel.For r = 0 the equation (2) can be modified as followingLet xj (j = 1, 2, . . . , N) ∈ �+ and wj(j = 1, 2, . . . , N)∈ �+ such that w1 + w2 + ... + wN = 1. For r �= 0 theweighted generalized mean of x1, x2, ..., xN is given as

y(x, w) = (w1xr1 + w2x

r2 + · · · + wNxr

N)1/r . (4)

Using Taylor series expansion et = 1 + t + O(t2), whereO(t2) is Landau notation [22] for terms of order t2 and higher,we can write xr

j as xrj = er log xj =1 + r log xj + O(r2). By

substituting this into definition of y(x, w) of equation (4),we get

y(x, w) = [w1(1 + r log x1) + w2(1 + r log x2)

+ · · · + wN(1 + r log xN) + O(r2)]1/r

= [1 + r(w1 log x1 + w2 log x2 + · · · + wN log xN)

+O(r2)]1/r

= [1 + r log (xw11 x

w22 , ..., x

wN

N ) + O(r2)]1/r

= exp

[1

rlog{1 + r log (x

w11 x

w22 , ..., x

wN

N )

+O(r2)}]

(5)

Now using the Taylor series of the form log (1 + t) = t +O(t2), we get

y(x, w) = exp

[1

r{r log (x

w11 x

w22 ...x

wN

N ) + O(r2)}]

= exp[{log (x

w11 x

w22 ...x

wN

N ) + O(r)}] (6)

Now taking the limit r −→ 0, we find

y(x, w) = exp[{log (x

w11 x

w22 , ..., x

wN

N )}]

= xw11 x

w22 , ..., x

wN

N (7)

which is most general type of the multiplicative unit given in[3] whose function approximation capability is proved therein. This means that the the multiplicative neuron unit givenin [3] is a special case of the proposed GMN model.

3 Multilayer feedforward network using GMN model

3.1 Network architecture and description

Let us consider a feedforward multilayer architecture of anetwork in which M hidden layer neurons receive N inputsas shown in Fig. 1. The input and output vectors of the net-work are X = [x1 x2 . . . xN ]T and Y = [y1 y2 . . . yK ]T

respectively. If wij is a weight that connects the ith neuronwith j th input, the activation value of the ith neuron can begiven as

neti =

N∑j=1

wijxrj + w0i

1/r

for i = 1, 2, ..., M (8)

where w0i is bias of the ith neuron in the hidden layer. Thenonlinear transformation performed by each of M neurons inthe network is given as

yi = f(neti ) for i = 1, 2, ..., M (9)

where f denotes a nonlinear function (sigmoid function inthis case). Similarly output of the kth neuron in the outputlayer can be given as

yk = f(netk) for k = 1, 2, ..., K (10)

where

netk =[

M∑i=1

wkiyri + w0k

]1/r

for k = 1, 2, ..., K (11)

Neural network learning with generalized-mean based neuron model 259

Fig. 1 Multilayer feedforward network using GMN model

where wki is the weight that connects the ith neuron of hid-den layer to the kth neuron of output layer and w0k is bias tocorresponding output layer neuron. The value of the gener-alization parameter r , for simplicity, is considered same forevery neuron in our simulations.

3.2 Learning rule

We describe an error backpropagation based learning rulefor the network using proposed GMN model. The simplic-ity of learning method makes it convenient for the model tobe used in different situations unlike the higher-order neuronmodel [12], which is difficult to train and is susceptible tocombinatorial explosion of terms.

A simple gradient descent rule, using a mean-squarederror function, is described by the following set of equations:Output layer: From equation (10) we have

yk = f(netk) = 1

1 + e−netk. (12)

The mean-squared error (MSE) is given as

EMSE = − 1

2PK

K∑k=1

P∑p=1

(y

p

k − yp

dk

)2, (13)

where yp

k and yp

dk are the actual and desired value of the kthneuron,for the pth pattern, in the output layer respectivelyand P is the number of training pattern in the input space.The weight update rule can be defined by equations (14–17).

�wki = −η∂E

∂wki

= 1

PK.δ yr

i net1−rk

r(14)

�w0k = −η∂E

∂w0k

= 1

PK.δ net1−r

k

P r(15)

wnewki = wold

ki + �wki (16)

wnew0k = wold

0k + �w0k , (17)

where δ = η yk (yk − ydk)(1 − yk) and η (η ∈ [0, 1]) islearning rate.

Hidden layer: Now from equations (8) and (9) we candefine the update rules for the weights wij and w0i by theequations (18) and (19).

�wij = −η∂E

∂wij

= 1

PK.δ(1 − yi) yr

i net1−ri xr−1

j

[∑Kk=1 net1−r

k wki

]

r(18)

�w0i = −η∂E

∂w0i

= 1

PK.δ(1 − yi) yr

i net1−ri

[∑Kk=1 net1−r

k wki

]

r. (19)

The new weights wnewij and wnew

0i can be determined accordingto the equations (16) and (17).

The learning rate η can either be adapted with epochsor can be fixed to a small number based on heuristics. Thislearning method is used to train the network in the next sec-tion to solve some famous benchmark problems relating toboth classification and function approximation.

4 Results and discussions

We discuss some of the important problems arising in ma-chine learning that can be broadly categorized as classifi-cation or function approximation. Detailed experiments andcomparison with existing multilayer network (MLN) topol-ogy suggest that the networks using proposed neuron modelachieve better results with less number of computations. In allthe problems we discuss, the dataset has been pre-processedby normalizing them between 0.1 and 0.9. In all simulations,the results reported are average of ten runs for a range ofreported learning rates. All multilayer networks reported are


trained using the standard gradient descent learning algo-rithm. The network topology reported is in the form of

n × h1 × · · · × hk × o

where n is number of input nodes, hi’s are number of nodes inthe ith hidden layer (for i = 1, . . . , k) and o is the number ofoutput nodes. Along with calculation of training and testingerror with network topology we have also used some sta-tistical properties like covariance, correlation and Akaike’sinformation criterion (AIC)[20,21] for testing the capabilityof the proposed neuron model. The AIC, which is defined byequation (20), evaluates the goodness of fit of model basedon the MSE for training data and the number of estimatedparameters.

AIC = −2 log (maximum likelihood) + 2L (20)

where L is number of independently estimated parameters.If output errors are statistically independent of each otherand follow normal distribution with zero mean and constantvariance, equation (16) can be written as

AIC = −2Pk log (σ 2) + 2L (21)

where P is number of training data, k number of output unitsand σ 2 the maximum likelihood estimate of the MSE. Themodel which minimizes AIC is optimal in the minimal aver-aging loss sense i.e. minimizing the expected discrepancy[16]. In all simulations we have taken the absolute values ofthe estimated MSE and outputs to avoid complex computa-tions.

4.1 Classification

We discuss the results of some popular classification prob-lems using the GMN model: channel equalization, syntheticdata and Pima Indians diabetes data.

4.1.1 Channel equalization

Band–limited communication channels are driven at highdata rates and often display inter-symbol-interference (ISI).Nonlinear channel equalization [13] is a popular problem inCommunication Systems that recovers an estimate of s(t −τ),denoted by s(t −τ), given the channel outputs, y(t), pres-ent and past, with τ the equalizer delay. Thus, the channeloutput vector

y(t) = [y(t), y(t − 1), . . . , y(t − m + 1)] , (22)

is used to compute s(t − τ), where m is the equalizer or-der. Considering the fact that s(t − τ) is binary, the problemis essentially a classification task. We consider a nonlinearmodel that uses equations (23) and (24), where the equalizerdelay and order both are two.

o = s(i) + 0.5s(i − 1) (23)

x(i − 2) = o − 0.9o3 . (24)

The data generated is then subjected to a 10 dB noise level.Fig. 2 shows the plot of two classes (zero and one) in the

Fig. 2 A channel equalization problem with two classes

space y(t) and y(t − 1). Five hundred points were taken fortraining and the system was tested with 4500 points.

The performance of the Channel Equalization problemis given in Table 2. The similar network using GMN modelproves a better network for Nonlinear Channel Equalizationthan the multilayer network and attains a bit error rate (BER)of 1.38.

4.1.2 Diabetes – Pima Indians

The famous diabetes dataset of Pima Indians women is usedas a benchmark for classifier systems. The idea is to pre-dict the presence of diabetes using seven variables: num-ber of pregnancies, plasma glucose concentration, diastolicblood pressure, triceps skin fold thickness, body mass index(weight/height2), diabetes pedigree, and age. In [14], theauthor provides an analysis of the dataset, which has a totalof 532 diabetes records. Out of the total 532, 200 are used fortraining and 332 are used for testing, with about 33% of thetotal dataset having diabetes. Table 3 shows the performanceof the GMN based network as compared to a multilayer net-work. The GMN based network in this case shows slightlyimproved performance in the training and testing sets withless number of parameters.

4.1.3 Synthetic two-class problem

This is a ‘realistic’ problem from Ripley [19] that is usedto illustrate how methods work. There are two features andtwo classes, each has a bimodal distribution. The class dis-tribution were chosen to allow the best-possible error rateof about 8% and are in fact equal mixtures of two distribu-tions. The component normal distributions have a commoncovariance matrix. The GMN based multilayer network wastrained using 250 sample of data and was tested with 1000samples. The performance of this network was comparedwith a similar network using MLPs and it was observed thatthe GMN based network performs better than the network


Table 2 Comparison of performance for the channel equalization problem between multiplicative neuron model and a standard multilayernetwork.

Method Structure Learning rate Training error(%) Testing error(%) AIC Epochs

GMNs(r = 1.2) 2 × 3 × 1 0.4 1 1.38 −9.1582 151MLPs 2 × 3 × 1 0.1 1.2 1.91 −8.8398 29

Table 3 Comparison of performance for the Pima–Indians diabetes dataset between GNM based network and a standard multilayer network


GMNs(r = 0.9) 7 × 3 × 1 0.5 20.5 12.03 −2.13 855MLPs 7 × 4 × 1 0.2 21 12.22 −2.08 300

Table 4 Comparison of performance for the synthetic two-class data between GMN based network and a standard multilayer network


GMNs(r = 1.2) 2 × 5 × 1 0.1 14 8.64 −13.24 214MLPs 2 × 5 × 1 0.1 19.2 13.84 −12.84 335

of perceptrons. Table 4 shows the performance of the GMNbased network as compared to a multilayer network.

4.2 Function approximation

We evaluate the capabilities of the proposed GMN Model onthe following problems:

(1) Mackey–Glass time Series dataset(2) Short–term internet incoming traffic prediction(3) Box–Jenkins gas furnace dataset

Mackey–Glass and Box–Jenkins datasets are benchmarkproblems and are popularly used to evaluate a proposed learn-ing method. We also investigate short–term internet incomingtraffic prediction using the HCL–infinet internet traffic data-set.

4.2.1 Mackey–Glass

The Mackey–Glass (MG) time series [11] represents a modelfor white blood cell production in leukemia patients and hasnonlinear oscillations. The MG delay–difference equation isgiven by equation (25).

y(t + 1) = (1 − b)y(t) + ay(t − τ)

1 + y10(t − τ)(25)

where a = 0.2, b = 0.1, and τ = 17. The time delay τis a source of complications in the nature of the time series.Objective of the modeling is to predict the value of time seriesbased on four previous values. Four measurements y(t), y(t−6), y(t − 12) and y(t − 18) are used to predict y(t + 1). Thetraining is performed on 250 samples and the model is testedon 200 time instants post training. A mean square error of7.06 × 10−6 was achieved on training the model for 3598epochs. Figure 3 shows the training and prediction results.

In Table 5, the performance of GMN model based net-work is compared with a multilayer network with one hidden

Fig. 3 Long term prediction results for the Mackey–Glass time seriesdataset using the proposed neuron model

layer having five nodes and trained using gradient descent.The performance of the GMN model based network is defi-nitely better than the multilayer network in this case, thoughit has fewer parameters.

4.2.2 HCL–infinet internet traffic

Short term internet traffic data was supplied by HCL–infinet(a leading Indian ISP). Weekly internet traffic graph with a30-Min average is shown in Fig. 4.

The solid-graph in gray shows the incoming traffic whilethe line-graph in black represents the outgoing traffic. Allvalues are reported in bits per second. We propose a modelfor predicting the internet traffic using previous values. Threemeasurements y(t), y(t − 1) and y(t − 2) are used to predicty(t+1) for incoming internet traffic. For the incoming traffic,150 training samples were taken and the model was tested for


Fig. 4 Weekly Graph (30 Min. Average) of the internet traffic for the HCL–infinet router at Delhi, INDIA

Table 5 Comparison of performance for Mackey–Glass time seriesdataset between GMN network and a standard multilayer network, bothtrained using gradient descent method

GMNs(r = 0.75) MLPs

Topology 4 × 3 × 1 4 × 5 × 1Epochs 3598 6637Training error 7.06 × 10−6 6.14 × 10−4

Testing error 2.19 × 10−4 3.02 × 10−4

Covariance 2.5 × 10−5 7.21 × 10−5

Correlation 0.9970 0.9934AIC −11.4477 −7.1422

150 samples. Figure 5 shows the prediction results for incom-ing internet traffic data. The performance is compared withmultilayer network which is shown in Table 6.

4.2.3 Box–Jenkins gas furnace

The Box–Jenkins gas furnace dataset [10] reports the furnaceinput as the gas flow rate u(t) and the furnace output y(t) asthe CO2 concentration. In this gas furnace, air and methanewere combined in order to obtain a mixture of gases whichcontained CO2. We model the furnace output y(t + 1) asa function of the previous output y(t) and input u(t − 3).The training and testing results of GMN based network and

Fig. 5 Testing result on the HCL-infinet MRTG incoming internet band-width usage data

Table 6 Comparison of performance for the incoming internet band-width usage of the HCL-infinet router data between GMN based net-work and a standard multilayer network, both trained using gradientdescent method

GMNs(r = 1.2) MLPs


Testing error 4.2 × 10−3 2.9 × 10−3

Covariance 1.53 × 10−5 6.90 × 10−6

Correlation 0.8822 0.9076AIC −11.4477 −7.1422

MLPs network are shown in Fig. 6. Table 7 shows the detailcomparison of these networks.

5 Conclusions

This paper presents a new approach towards the conceptu-alization of a neuron model with better learning and gener-alization capabilities. The idea was motivated by nonlinearactivities in the brain, which have been modeled by the mostbasic of all nonlinearities. While, this is not the first instancewhen new neuron has been thought of as a potent model, thiswork provides a simpler and generalized methods to imple-ment the model so that it can be used without the hassles of

Fig. 6 Performance result on the Box–Jenkins dataset


Table 7 Comparison of performance for the Box–Jenkins gas furnacedataset between GMN model and a standard multilayer network, bothtrained using gradient descent method

GMNs(r = 0.9) MLPs


Testing error 9.22 × 10−4 0.0010Covariance 1.46 × 10−4 2.27 × 10−5

Correlation 0.9894 0.9856AIC −13.0532 −7.5846

possible combinatorial explosions, as in higher–order neu-rons. The simulation results show that the proposed GMNmodel outperforms the existing perceptron model.

Acknowledgements We would like to thank P.V. Ramadas, HCL-Infi-net, for providing the Internet Traffic data and Prof. D.H. Balard , Uni-versity of Rochester,USA, for useful discussions related to the proposedneuron model.

References

1. McCulloch WS, Pitts W (1943) A logical calculation of the ideasimmanent in nervous activity. Bull Math Biophys 5:115–133

2. Koch C (1999) Biophysics of computation: information processingin single neurons. Oxford University Press, New York

3. Schmitt M (2001) On the complexity of computing and learningwith multiplicative neural networks. Neural Comput 14:241–301

4. Shin Y, Ghosh J (2001) Ridge polynomial networks. IEEE TransNeural Netw 6:610–622

5. Zhang CN, Zhao M, Wang M (2000) Logic operations based onsingle neuron ratioal model. IEEE Trans Neural Netw 11:739–747

6. Basu M, Ho TK (1999) Learning behavior of single neuron classi-fiers on linearly separable or nonseparable inputs. IEEE IJCNN’992:1259–1264

7. Labib R (1999) New single neuron structure for solving nonlinearproblems. IEEE IJCNN’99 1:617–620

8. Iyoda EM, Nobuhara H, Hirota K (2003) A Solution for the N-bit parity problem using a single translated multiplicative neuron.Neural Processing Lett 18:233–238

9. Hoppensteadt F, Izhikevich E (2001) Canonical neuron models.In: Arbib MA (ed), Brain theory and neural networks. MIT Press,Cambridge

10. Box GEP, Jenkins GM, Reinse GC (1994) Time series analysis:forecasting and control. Prentice Hall, Englewood Cliffs

11. Mackey M, Glass L (1997) Oscillation and chaos in physiologicalcontrol systems. Science 197:287–289

12. Guler M, Sahin E (1994) A new higher-order binary-input neuralunit: learning and generalizing effectively via using minimal num-ber of monomials. In: Proceedings of third turkish symposium onartificial intelligence and neural networks, pp 51–60

13. Proakis JG (2001) Digital communications. McGraw Hill Interna-tional, Singapore

14. Ripley BD (1996) Pattern recognition and neural networks. Cam-bridge University Press, Cambridge

15. Schreiner K (2001) Neuron function: the mystery persists. IEEEIntll Syst 16:4–7

16. Murata N, Yoshizawa S, Amari S (1994) Network information cri-terion-determining the number of hidden units for and artificialneural networks model. IEEE Tran Neural Netw 5:865–872

17. Plate TA (2000) Randomly connected sigma-pi neurons can formassociator networks. NETCNS: Network: Comput Neural Syst11:321–332

18. Piegat A (2001) Fuzzy modeling and control. Physica-Verlag, Hei-delberg, New York

19. Ripley DB (1994) Neural networks and related methods of classi-fication. J Roy Stat Soc Ser B56:409–456

20. Akaike H (1974) A new look at the statistical model identification.IEEE Tran Appl Comp AC-19: 716–723

21. Fogel DB (1991) An information criterion for optimal neural net-work selection. IEEE Tran Neural Netw 2:490–497

22. Hardy GH, Wright EM (1979) Some notations. In: An introductionto the theory of numbers 5th ed. Clarendon Press, Oxford pp. 7–8

Documents

Neural network learning with generalized-mean based neuron model