12
Applied Numerical Mathematics 45 (2003) 87–98 www.elsevier.com/locate/apnum A comparative analysis of neural network performances in astronomical imaging Rossella Cancelliere , Mario Gai University of Torino, Dipartimento di Matematica, Via Carlo Alberto 10, Torino 10123, Italy Abstract Neural networks are widely used as recognisers and classifiers since the second half of the 80’s; this is related to their capability of solving a nonlinear approximation problem. A neural network achieves this result by training; this iterative procedure has very useful features like parallelism, robustness and easy implementation. The choice of the best neural network is often problem dependent; in literature, the most used are the radial and sigmoidal networks. In this paper we compare performances and properties of both when applied to a problem of aberration detection in astronomical imaging. Images are encoded using an innovative technique that associates each of them with its most convenient moments, evaluated along the {x,y } axes; in this way we obtain a parsimonious but effective method with respect to the usual pixel by pixel description. 2002 IMACS. Published by Elsevier Science B.V. All rights reserved. 1. Introduction The first largely successful neural network model, the multilayer perceptron, was presented in 1986 by D. Rumelhart et al. [15] as an extension of the perceptron model [12]. It is well-known that there is a quite intimate connection between approximation theory and neural network design in fact, given a compact set D R P , a multilayer perceptron with one hidden layer can uniformly approximate any continuous function in C(D) to any required accuracy. The first scientists to establish this result were Cybenko [5], Hornik et al. [10] and Funahashi [7], while an excellent survey of this topic is available in [6]. * Corresponding author. E-mail addresses: [email protected] (R. Cancelliere), [email protected] (M. Gai). 0168-9274/02/$30.00 2002 IMACS. Published by Elsevier Science B.V. All rights reserved. doi:10.1016/S0168-9274(02)00237-4

A comparative analysis of neural network performances in astronomical imaging

Embed Size (px)

Citation preview

Page 1: A comparative analysis of neural network performances in astronomical imaging

Applied Numerical Mathematics 45 (2003) 87–98www.elsevier.com/locate/apnum

A comparative analysis of neural network performancesin astronomical imaging

Rossella Cancelliere∗, Mario Gai

University of Torino, Dipartimento di Matematica, Via Carlo Alberto 10, Torino 10123, Italy

Abstract

Neural networks are widely used as recognisers and classifiers since the second half of the 80’s; this is related totheir capability of solving a nonlinear approximation problem. A neural network achieves this result by training;this iterative procedure has very useful features like parallelism, robustness and easy implementation.

The choice of the best neural network is often problem dependent; in literature, the most used are the radial andsigmoidal networks. In this paper we compare performances and properties of both when applied to a problem ofaberration detection in astronomical imaging.

Images are encoded using an innovative technique that associates each of them with its most convenientmoments, evaluated along the{x, y} axes; in this way we obtain a parsimonious but effective method with respectto the usual pixel by pixel description. 2002 IMACS. Published by Elsevier Science B.V. All rights reserved.

1. Introduction

The first largely successful neural network model, the multilayer perceptron, was presented in 1986by D. Rumelhart et al. [15] as an extension of the perceptron model [12]. It is well-known that thereis a quite intimate connection between approximation theory and neural network design in fact, givena compact setD ⊂ R

P , a multilayer perceptron with one hidden layer can uniformly approximate anycontinuous function inC(D) to any required accuracy. The first scientists to establish this result wereCybenko [5], Hornik et al. [10] and Funahashi [7], while an excellent survey of this topic is availablein [6].

* Corresponding author.E-mail addresses: [email protected] (R. Cancelliere), [email protected] (M. Gai).

0168-9274/02/$30.00 2002 IMACS. Published by Elsevier Science B.V. All rights reserved.doi:10.1016/S0168-9274(02)00237-4

Page 2: A comparative analysis of neural network performances in astronomical imaging

88 R. Cancelliere, M. Gai / Applied Numerical Mathematics 45 (2003) 87–98

Many other neural models were developed in the following but for the sake of brevity and becauseof its particular importance for the aims of this paper we only remember here the fundamental work byGirosi and Poggio [14] on radial basis functions neural networks.

Recently some attempts to use neural networks in astronomy have been performed, mainly in the fieldof adaptive optics: the reader can find details in the papers by Loyd-Hart et al. [11] and Wizinowichet al. [16].

The possibility of solving a nonlinear problem by learning from examples, instead of developing adetailed analytical model, is appealing in the case of high precision astronomical imaging from moderntelescopes. In particular, it is possible to locate the position of a stellar image with accuracy well belowits characteristic size, when the signal to noise ratio (SNR) is sufficiently high. The location uncertaintyis σ = α ·L/SNR, whereα is a factor keeping into account geometric factors and the centring algorithmandL is the root mean square width of the image [8]. The best estimate of image position is obtainedby a least square approach, evaluating the discrepancy between the data and the template describing thereference image. The location algorithm is therefore very sensitive to any variation of the effective imagewith respect to the selected template.

It is essential to be able to verify the compatibility of the real images with the reference profile; also,the capability of extracting from the data some parameters suitable for a new definition of the template,in order to improve its similarity to the data, is of paramount importance. Self-calibration of the data, bydeduction of the parameters for optimisation of the image template, is a key element in the control of thesystematic effects in the position measurement.

Our objective is the implementation of a tool for analysis of realistic images and deduction of a set ofaberration parameters able to describe their discrepancy with respect to the ideal, non-aberrated image.

In Section 2 we review the main characteristics of neural networks and backpropagation algorithm,with a brief reminder of the specific definitions of networks using sigmoidal and radial basis functions. InSection 3 we discuss the image characterisation problem addressed in the present work, both in a generalcase and in the simplified framework adopted here. In Section 4 we describe the generation of the dataset and its processing. Finally, in Section 5 we discuss the current results and we outline the guidelinesfor our future work on the subject.

2. Neural networks

A comprehensive review on neural network properties and applications can be found in [9]; below, wejust mention some of the basic definitions and characteristics.

Neural networks solve the problem of learning from examples, that is, given the training set ofN

multi-dimensional data pairs{(xi,F (xi)) | xi ∈ RP ,F (xi) ∈ R

Q}, i = 1, . . . ,N, we want that

(1) if xi is the input to the network, the output is close to, or coincident with, the desired answerF(xi);(2) the network also has generalization properties, that is it gives as outputF(xi) even if the input is only

“close to”xi , for instance, a noisy or distorted or incomplete version ofxi .

This can be formalized as a classical approximation problem: given a set of pointsxi, i = 1, . . .N ,distinct and generally scattered, in a domainD ⊂ R

P , and a linear spaceΦ(D), spanned by continuousreal basis functionsoj , j = 1, . . . ,H , the multivariate approximation problem at scattered data pairs

Page 3: A comparative analysis of neural network performances in astronomical imaging

R. Cancelliere, M. Gai / Applied Numerical Mathematics 45 (2003) 87–98 89

(xi, F (xi)) consists in finding a functiono(x) ∈ RQ, whose components are a linear superposition of the

basis functionsoj , minimising the error function

E ≡∑i

Q∑m=1

(Fm(xi)− om(xi)

)2 =∑i

∑m

(Fm(xi)−

H∑j=1

wmjoj (xi)

)2

.

There are two main classes of neural networks, depending on the functionsoj implemented in thehidden layers: sigmoidal and radial networks. They compute distances in the input space (i.e., amongpatternsxi ∈ R

P ) using two different metrics, i.e., respectively Euclidean distance in the former case anda metric based on inner products in the latter. Both kinds of network are briefly described below.

2.1. Sigmoidal neural network

Sigmoidal neural networks are described in more detail in [15], as well as the backpropagationalgorithm. Their architecture is shown schematically in Fig. 1, in which the most common three-layercase is shown, and it is described by the following equations:

ink+1j =

∑j ′wjj ′okj ′ + biasj , ok+1

j = σ (ink+1j

)≡ 1

1+ e−ink+1j

, ooutm ≡

∑j

wmjoout−1j .

Fig. 1. A three layer sigmoidal neural network.

Page 4: A comparative analysis of neural network performances in astronomical imaging

90 R. Cancelliere, M. Gai / Applied Numerical Mathematics 45 (2003) 87–98

Herein is the input to each unit,o is its output andwij is the weight associated to the connection betweenunits i andj ; each unit is defined by two indexes, a superscript specifying its layer (i.e., input, hidden oroutput layer) and a subscript labelling each unit in a layer.

The training procedure must identify the best set of weights{wij } solving the approximation problemo(xi)≈ F(xi).

This is obtained, usually, with an iterative process corresponding to the standard backpropagationalgorithm. At each step, each weight is modified accordingly to the gradient descent rule, completedwith the momentum term,wij = wij + �wij , �wij = −η ∂E

∂wijwhereE is the error function defined

above.This procedure can be iterated many times over the complete set of examples{xi,F (xi)} (the training

set), and under appropriate conditions it converges to a suitable set of weighs defining the desiredapproximating function. Convergence is usually defined in terms of the error function, integrated overthe whole training set, dropping below a pre-selected thresholdET , and at this point the neural networkcan be tested using a different set of data{x′

i , F (x′i )}, the so-called test set. If the procedure is performed

correctly, the network error integrated on the test set should be small, comparable withET apart scalingfor the number of test and training instances.

2.2. Radial basis functions neural network

The radial basis function networks (described, e.g., in [4]) offer a viable alternative to sigmoidalnetworks in many signal processing applications. With respect to the previous case, the main architecturaldifference concerns the hidden layers functionsoj , which in this case are radial, i.e., theQ componentsof the output functiono(x) are expressed as:oout

m (xi)=∑j wmjG(‖xi − cm‖).

Theoretical investigations and practical results suggest that the choice of the nonlinear functionG(·)is not crucial to the performances. The thin-plate-spline functionG(ν) = ν2 log(ν) and the GaussianfunctionG(ν)= exp(−ν2/β2) are two typical choices; we adopt the latter.

The weights between input and hidden layers contain the coordinates of the function centrescm thatare usually chosen according to three different procedures: a random selection, a self-organized selectionor a supervised selection. We used the procedure from [4], which selects the radial function centres oneat a time using clustering techniques, until the error function is reduced below the pre-selected thresholdET .

The weightswmj connecting themth component of the output functiono(x) to theH radial functionsare evaluated solving the matricial linear systemGTGWm =GTFm (see [9]) where:

Wm =∣∣∣∣∣∣wm1...

wmH

∣∣∣∣∣∣ , Fm =∣∣∣∣∣∣Fm(x1)...

Fm(xN)

∣∣∣∣∣∣ , G=∣∣∣∣∣∣G(‖x1 − c1‖) . . . G(‖x1 − c

H‖)

.... . .

...

G(‖xN

− c1‖) . . . G(‖xN

− cH‖)

∣∣∣∣∣∣ .The weight vectorWm is the pseudoinverse (minimum-norm) solution to the overdetermined least-squares data-fitting problem, as shown by [2]; in this case the whole matrixG must be stored in memoryto solve the system, and matrices of the same dimension must be inverted; the computational load growssteeply with the number of training samplesN , as will be discussed below. In such a case, it is stillpossible to find practical methods to overcome this problem (see [3]); the matrixG can be split in sub-matrices, separately processed and then merged again for the reconstruction of the desired pseudo inversematrix.

Page 5: A comparative analysis of neural network performances in astronomical imaging

R. Cancelliere, M. Gai / Applied Numerical Mathematics 45 (2003) 87–98 91

3. Imaging in astronomy

The photons from a remote point-like source (e.g., a star), when reaching a telescope, can be describedto an excellent approximation in terms of a planar wavefront. Then they are beamed onto a focal planeto provide an image of the observed region, translating the angular distribution of brightness on the skyinto a bi-dimensional pattern of light intensity. The image of a star is a distribution of finite size, due tothe finite aperture of the instrument. We will not address in more detail the Fraunhofer diffraction theory,described in textbooks on optics (see [1], from which the notation is derived); however, we remind thatthe ideal monochromatic image of a point-like source at wavelengthλ, obtained from an unobstructedcircular pupil of diameterD, without aberrations, is radial and described by the Airy function:

I (r)= k[2J1(r)/r]2, (1)

whereJ1 is the Bessel function of the first kind, order one, andr =D/2 is the aperture radius; the Airyfunction is shown by the solid line in Fig. 2, compared with an aberrated image (dashed line) affectedby one wavelength of coma (see below). The Airy radius, enclosing the central lobe, is 1.22λ/D. Theperturbations introduced on the image by real optics can be described as a perturbation of the wavefront,i.e., a surface displacement; this surface can be decomposed accordingly to a set of basis functions.

Accurate decomposition of a generic aberrated image may require a large number of terms; this isusually done by means of the Zernicke polynomials. Historically, a simpler decomposition was used,based on five basis functions only, which proved adequate for most instruments: the Seidel aberrations,corresponding, apart normalisation, to the lowest Zernicke polynomials. Both function sets are describedin [1]; the Seidel aberrations are used herein, to provide a manageable implementation of the problem(see below).

Fig. 2. Section of the Airy image (solid line) and of an aberrated image generated with one wavelength of coma (dashed line).The latter image is displaced and asymmetric. Horizontal units are Airy radii, and the normalisation is to the peak value of theAiry image.

Page 6: A comparative analysis of neural network performances in astronomical imaging

92 R. Cancelliere, M. Gai / Applied Numerical Mathematics 45 (2003) 87–98

The wavefront deviation from planarity is expressed by the phase aberrationΦ, which defines the pupilfunctioneiΦ . The diffraction image on the focal plane by the square modulus of the Fourier Transform:

I (r, φ)= k

π2

∣∣∣∣∣1∫

0

2π∫0

dθρ eiΦ(ρ,θ)e−iπrρ cos(θ−φ)∣∣∣∣∣2

(2)

whereρ andθ are the pupil coordinates (normalised radius and azimuth), whereasr andφ are the imagecoordinates. WhenΦ = 0 (non-aberrated case), Eq. (1) is retrieved.

In the simplified framework of the five classical Seidel aberrations, i.e.:

As : Spherical aberration;Ac : Coma;Aa : Astigmatism;Ad : Defocus (field curvature);At : Distortion (t = tilt) ,

the phase aberration is generated by superposition of the five terms:

Φ(ρ, θ)= 2π

λ

[Asρ

4 +Acρ3 cosθ +Aaρ2 cos2 θ +Adρ2 +Atρ cosθ]; (3)

by replacement of 3 in 2, we can see that the relation among the aberrations and the image is nonlinear.We consider the regime of “small” or “acceptable” aberrations, corresponding to the classical Rayleigh

criterion of one quarter of wavelength ([−0.25,0.25]), for the test set. We select for training the slightlylarger interval[−0.3,0.3], convenient to avoid potential boundary problems. Over a larger aberrationrange, the image quality degrades significantly, and we are most interested in network classificationcapability in the case of small image perturbations. The Seidel aberrations are the targets of our networks,whereas the encoding of the input image is discussed below. Moreover, since the effect of sphericalaberration is small, we consider it as noise and restrict our target to the four more relevant terms; thereforethe output vector of the neural network is four-dimensional.

3.1. Number of training instances

Our guideline in defining the number of training cases is the following. IfQ output variables aredesired, then the corresponding domainO ∈RQ must be fractioned in a sufficient number of intervals toallow classification; the most convenient number depends upon the problem. In practice, we may take intoaccountC = 5−10 classes; we should place a few points over each region bearing a relevant variation ofthe function to be approximated: if it is sufficiently smooth, a comparably small number of classes maybe sufficient. In order to place an example within each cell ofO, we must provide a numberT = CQ oftraining instances; they can be distributed either regularly or randomly, depending upon the problem.

Using Q = 4 target variables corresponding to the Seidel aberrations andC = 5−10 classes, weneed order ofT = 625−10000 examples; for convenience, we select for the training set an intermediatenumber ofT = 1900 cases, randomly distributed. The validity of this assumption will be discussed onthe basis of the results achieved.

We remark that a more detailed description of the aberrations, e.g., based on the most relevant 12Zernicke polynomials (Q = 12), requiresT = 512 � 2.44× 108 cases when usingC = 5 classes andT = 1012 cases when usingC = 10 classes. Taking into account the computational load, we restrict our

Page 7: A comparative analysis of neural network performances in astronomical imaging

R. Cancelliere, M. Gai / Applied Numerical Mathematics 45 (2003) 87–98 93

initial assessment of the problem to the simpler case of the Seidel aberrations, which remains manageableon small computers.

3.2. Image encoding

Usually, in astronomical imaging, the elementary image is sampled over a small number of pixels, tomaximise the field of view, i.e., observe the largest possible area. The minimum sampling requirementsare of order of two pixels over the full width at half maximum, or about four-five pixels within the Airydiameter. In this case, as shown in Fig. 3, the detected signal is subject to dramatic changes dependingon the relative position of the image with respect to the detector array. Therefore, using the measuredimages, the changes in the pixel to pixel distribution are not convenient for evaluation of the discrepancyvs. the nominal Airy image. It is possible to modify part of the focal plane assembly in order to add amagnifying device providing good sampling for the images in a small region. Assuming a sampling of 20pixels per Airy diameter, and reading up to the third Airy ring, the image size in this case is of order of60×60= 3600 pixels. Direct usage of such images as input data to the neural network is impractical, andidentification of a more compact encoding, possibly removing the need for additional custom hardware,appears as an appealing strategy.

The encoding scheme we adopt for the images allows extraction of the desired information forclassification, without resorting to the expensive classical pixel by pixel technique. Each input imageis described by the centre of gravity and the first central moments, up to the fourth order:

µx =�

dx dyx · I (x, y)�dx dyI (x, y)

, µy =�

dx dyy · I (x, y)�dx dyI (x, y)

,

σ 2x =

�dx dy(x −µx)2 · I (x, y)�

dx dyI (x, y), σ 2

y =�

dx dy(y −µy)2 · I (x, y)�dx dyI (x, y)

,

M(i, j)=�

dx dy(x−µxσx)i(

y−µyσy)j · I (x, y)�

dx dyI (x, y).

(4)

Fig. 3. Airy image sampled over a low-resolution pixel array. Small displacements of the image vs. the detector provide largevariations of the detected signal.

Page 8: A comparative analysis of neural network performances in astronomical imaging

94 R. Cancelliere, M. Gai / Applied Numerical Mathematics 45 (2003) 87–98

The central moments are much less sensitive than the image itself to the effects related to the finitepixel size; besides, they can be deduced also from the low resolution images mentioned above, without theneed for high resolution detectors. Moreover, the central moments have an immediate physical meaning:

– the first order moment provides the centre of gravity of the image;– the second order central moment is the mean square width;– the third order central moment (skewness) is an index of the image asymmetry;– the fourth order central moment (kurtosis) is an index of how much peaked is the distribution; for a

Gaussian, its value is 3.

Central moments are described in most textbooks on statistics, e.g., [13].First of all, we must verify that the moments are sensitive to each aberration; the centre of gravity along

x is shown in Fig. 4, where the values obtained for variation of each aberration, independently, over the

Fig. 4. Independent variation of thex centre of gravity vs. each aberration (spherical, coma, astigmatism, defocus and distortion),subsequently, over the range[−0.3λ,0.3λ]. No sensitivity is evidenced.

Fig. 5. Variation of thex variance vs. each aberration: it shows highest sensitivity to defocus and lower sensitivity to sphericalaberration and coma.

Page 9: A comparative analysis of neural network performances in astronomical imaging

R. Cancelliere, M. Gai / Applied Numerical Mathematics 45 (2003) 87–98 95

Fig. 6. Variation of the moments (left to right, top to bottom)µy , σ2y ,M(0,3),M(0,4),M(1,1) andM(2,2) for independent

variation of each aberration.

selected range[−0.3λ,0.3λ], are plotted side by side for ease of comparison. No relevant variation ofthis parameter with any aberration is perceivable, and the parameter is therefore discarded. Conversely,the x square width, shown in Fig. 5, appears to be sensitive mostly to defocus, much less to sphericalaberration and coma, and insensitive to astigmatism and distortion. Similarly, we evaluate the behaviourof the other low order moments with respect to variations of each aberration, selecting the suitable inputvariables for our neural network. The seven selected moments areσ 2

x , µy , σ 2y ,M(0,3),M(0,4),M(1,1)

andM(2,2); the last six are shown in Fig. 6. Therefore, the input vector to the neural network has sevencomponents.

4. Image processing and results

We have defined above the structure of input data (seven component vector), of output data (fourcomponent vector), and the size of the training set. Here we describe in more detail the generation of thetraining and test sets, the neural networks evaluated, and their results.

The training set is a list ofT = 1900 instances, each represented by an {input, output} pair, i.e., 11values per row. We use a denser distribution of the points along the individual axes of the target space,

Page 10: A comparative analysis of neural network performances in astronomical imaging

96 R. Cancelliere, M. Gai / Applied Numerical Mathematics 45 (2003) 87–98

placing 100 samples on each axis (variation of a single aberration at a time); the remaining 1400 points arerandomly distributed, i.e., each variable is independently chosen. From a given set of five aberration val-ues, the image is built accordingly to Eqs. (2) and (3); then the moments corresponding to Eq. (4) are com-puted and the values inserted in the data file. We neglect the details of numeric implementation for brevity.

The test set is built in a similar way, picking independent variation for each output variable, buildingthe image and evaluating the input variables to the network. A total of 1000 test instances has beengenerated.

Two neural networks are optimised on the training set and verified on the test set, a sigmoidal networkand a radial basis function network, hereafter labelled respectivelySNN and RNN. The hidden layersare composed respectively by 60 units (SNN) and 175 units (RNN); the difference is based on previousexperience ([9] and references therein) showing that in general the former requires less internal nodes,but more training iterations, than the latter. Actually, the training required 1000 iterations (SNN) and 175iterations (RNN); in both cases we achieve convergence to rather small global error on the training set,and comparable global error on the test set. Both optimised networks provide reasonably good results interms of approximation of the desired unknown function relating aberrations and moments. However, weachieve better results with theRNN than with theSNN.

For result evaluation, we remark that the desired behaviour of the neural network is the computation,over the test set, of output values coincident with the pre-defined target values. Thus, the plot of outputvs. target, shown in Fig. 7, should ideally be a straight line (y = a+bx) at angleπ/4, passing for the ori-gin, i.e.,a = 0 andb= 1; we compute the best fit parameters, together with the fit errors, and the resultsare shown in Table 1. We remark that the discrepancy with respect to the straight line parameters puts inevidence a systematic error in the network optimisation, which is not able to provide a qualitatively ade-quate description of the desired approximation function, whereas the spread of the points vs. the straightline is the symptom of an exceeding level of noise. Both may be due to an insufficient number of traininginstances; however, the former may be caused by insufficient information encoded in the input variables.

We remark that the outputs computed from both networks are quite consistent with the desired testtargets; this means that both are correctly optimised for the problem of aberration reconstruction fromthe images encoded in terms of moments. However, the radial basis function neural network providesbetter results, since the straight line fit is closer in most cases to the ideal bisector of the first quadrant,i.e., the systematic errors are lower. Coma and distortion are in both cases the best recognized aberrations,

Table 1Results of best fit between the network output and the test set targets, for each output variable andfor both networks

SNN RNN

Coma a = 0.00355 σa = 0.00020 a = −6.378e−6 σa = 6.22e−6b = 1.00651 σb = 0.00141 b= 1.00028 σb = 4.29e−5

Astigmatism a = −0.00501 σa = 0.00112 a = −0.01432 σa = 0.00093b = 0.94863 σb = 0.00770 b= 1.08102 σb = 0.00643

Defocus a = 0.01507 σa = 0.00196 a = 0.00565 σa = 0.00177b = 0.78881 σb = 0.01347 b= 0.94540 σb = 0.01210

Distortion a = 0.00248 σa = 0.00013 a = −3.53e−6 σa = 1.67e−6b = 1.02264 σb = 0.00091 b= 0.99997 σb = 1.15e−5

Page 11: A comparative analysis of neural network performances in astronomical imaging

R. Cancelliere, M. Gai / Applied Numerical Mathematics 45 (2003) 87–98 97

whereas astigmatism and defocus are affected by larger systematic errors and larger dispersion. Besides,the spherical aberration, present but ignored, induces effects on the image which are similar to those ofastigmatism and defocus; it is therefore reasonable that its noise contributions are mainly found on thispair of variables. Increasing the number of network outputs to include explicitly the spherical aberrationmay improve the performance on astigmatism and defocus as well; besides, increasing the number ofoutput variables is also required in case of adoption of a more detailed decomposition of the aberrations,e.g., in terms of a certain number of Zernicke functions. This will probably require also an increase inthe number of input variables, and of training instances, in order to provide sufficient information to thenetwork and allow for inversion of the problem.

Fig. 7. Performances of the sigmoidal (left) and radial (right) neural networks; from top to bottom, coma, astigmatism, defocusand distortion are shown. The blue (solid) line follows the test set targets, whereas the dots are the values computed by thenetworks.

Page 12: A comparative analysis of neural network performances in astronomical imaging

98 R. Cancelliere, M. Gai / Applied Numerical Mathematics 45 (2003) 87–98

5. Conclusions

In this paper we deal with two versions of neural networks, used to reconstruct the aberrations presentin astronomical images. We test both the possibility of encoding the problem in a compact set of imagedescriptors with an immediate physical meaning, i.e. the moments, and the performance of the twonetworks, using different nonlinear functions in their hidden layer, and a different number of hiddennodes.

We achieve good results from both networks, confirming that the problem can be solved by thistechnique; also, the performance is better in the case of radial basis function networks.

We can also draw the guidelines for our future work: the number of nodes in each network may beincreased, as well as the number of training instances, in order to improve the approximation providedby the neural network. Also, in order to improve the practical results, the formal description of theproblem requires increasing the number of output variables (to account for spherical aberration or otherparameters) and input variables (encoding additional image information). This also requires a largernumber of training cases for optimisation of the network, resulting in an increase in the necessarycomputing power. Our objective is therefore to find the best definition of the problem, also in this morechallenging framework.

References

[1] M. Born, E. Wolf, Principles of Optics, Pergamon, New York, 1985.[2] D.S. Broomhead, D. Lowe, Multivariable functional interpolation and adaptive networks, Complex Systems 2 (1988) 321–

355.[3] R. Cancelliere, A high parallel procedure to initialize the output weights of a radial basis function orBP neural network,

in: T. Sorevik, F. Manne, R. Moe, A.H. Gebremedhin (Eds.), Applied Parallel Computing, New Paradigms for HPC inIndustry and Academia, PARA 2000, in: Lecture Notes Comput. Sci., Vol. 1947, Springer, Berlin, 2001, pp. 384–390.

[4] S. Chen, C.F.N. Cowan, P.M. Grant, Orthogonal least squares learning algorithm for radial basis function networks, IEEETrans. Neural Networks 2 (2) (1991) 302–309.

[5] G. Cybenko,∞ approximation by superpositions of a sigmoidal function, Math. Control-Signals Syst. 2 (1989) 303–314.[6] S.W. Ellacott, Aspects of the numerical analysis of neural networks, Acta Numer. (1994) 145–202.[7] K.I. Funahashi, On the approximate realization of continuous mappings by neural networks, Neural Networks 2 (1989)

183–192.[8] M. Gai, S. Casertano, D. Carollo, M.G. Lattanzi, Location estimators for interferometric fringe, Publ. Astron. Soc. Pac. 110

(1998) 848–862.[9] S. Haykin, Neural Networks, Comprehensive Foundation IEEE Computer Society Press, New York, 1994.

[10] K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximators, Neural Networks 2(1989) 359–366.

[11] M. Loyd-Hart, P. Wizinowich, B. McLeod, D. Wittman, D. Colucci, R. Dekany, D. McCarthy, J.R.P. Angel, D. Sandler,First results of an on-line adaptive optics system with atmospheric wavefront sensing by an artificial neural network,Ap. J. 390 (1992) L41–44.

[12] M. Minsky, S. Papert, Perceptrons, MIT Press, Cambridge, MA, 1969.[13] A. Papoulis, Probability, Random Variables, and Stochastic Processes, McGraw-Hill, New York, 1984.[14] T. Poggio, F. Girosi, Networks for approximation and learning, Proc. IEEE 78 (9) (1990) 1481–1497.[15] D. Rumelhart, G.E. Hinton, R.J. Williams, Learning internal representation by error propagation, in: D.E. Rumelhart,

J.L. McClelland (Eds.), Parallel Distribuited Processing (PDP): Exploration in the Microstructure of Cognition, Vol. 1,MIT Press, Cambridge, MA, 1986, pp. 318–362.

[16] P. Wizinowich, M. Loyd-Hart, R. Angel, Adaptive optics for array telescopes using neural networks techniques ontransputers, in: Transputing ’91, Vol. 1, IOS Press, Amsterdam, 1991, pp. 170–183.