6
Classification of Normal, Benign and Malignant Tissues Using Co-occurrence Matrix and Bayesian Neural Network in Mammographic Images Leonardo de O. Martins 1 , Alcione M. dos Santos 2 , Arist ´ ofanes C. Silva 1 and Anselmo C. Paiva 1 1 Universidade Federal do Maranh˜ ao - UFMA 2 ucleo de Estat´ ıstica e Inform´ atica do Hospital Universit´ ario - UFMA Av. dos Portugueses, SN, Campus do Bacanga, Bacanga CEP 65085-580 - S˜ ao Lu´ ıs, MA, Brasil [email protected], [email protected], [email protected], [email protected] Abstract This work analyzes the application of the co-occurrence matrix to the characterization of breast tissue as normal, benign or malignant in mammographic images. The method characterization is based on a process that selects, using forward selection technique, from all computed measures which best discriminate among normal, benign and malignant tissues. Then, a Bayesian neural network is used to evaluate the ability of these features to predict the classification for each tissue sample. To verify this application we also describe tests that were carried out using a set of 218 tissues samples, 68 benign and 51 malignant and 99 normals. The result analysis has given an accuracy of 86.84%, which means encouraging results. The preliminary results of this approach are very promising in characterizing breast tissue. 1 Introduction The breast cancer is the major cause of death by cancer in the female population. It is know that the best prevention method is the precocious diagnosis, what lessens the mortality and enhance the treatment [1]. Every three minutes, a woman is diagnosed with breast cancer, and every 13 minutes, a woman dies from the disease. It is also estimated that one in eight women will be diagnosed with breast cancer in her lifetime, and 1 in 30 will die from it [18]. Mammography is currently the best technique for reliable detection of early, non-palpable, potentially curable breast cancer [1]. The mortality rate from this disease decreased for the first time in 1995, due in part to the increasing utilization of screening mammography [1]. However, radiologists vary in their interpretation of mammograms. In addition, the interpretation is a repetitive task that requires much attention that make it easy to misinterpretation. Therefore, in the past decade there has been tremendous interest in the use of image processing and analysis techniques for Computer Aided Diagnosis (CAD) from digital mammograms, which will replace conventional mammograms in the near future. The goal has been to increase diagnostic accuracy as well as the reproducibility of mammographic interpretation. CAD systems can aid radiologists by providing a second opinion and may be used in the first stage of examination in the near future, allowing the variability reduction among radiologists interpretation of mammograms [14]. For that, it is important to develop many techniques to detect and recognize suspicious lesions and also to analyze and discriminate them. One of the methods reported for lesions diagnosis from mammograms are the neural networks. Neural networks have been used as a final step for classification, after feature selection. In [23], a neural- genetic algorithm for feature selection in conjunction with neural and statistical classifiers has obtained a classification rate of 85.0% for a test set. Recently, metaheuristics have been applied in identification of suspicious region in mammograms [15], using the border points, enhanced by a genetic algorithm, and nipple position, identified by an ant colony system. The results obtained with a set of mammograms indicate that their method can improve the sensitivity and reliability of the systems for automated detection of breast tumors. A computer aided neural network classification of regions of suspicion on digitized mammograms is also presented in [5], in which a Radial Basis Function Neural Network (RBFNN) is used to complete the classification, fed by features selected out by a new technique based on independent component analysis. Experiments in the MIAS Database have shown a recognition accuracy of 88.23% in the detection of all kinds of abnormalities and 79.31% in the task of distinguishing Proceedings of the Ninth Brazilian Symposium on Neural Networks (SBRN'06) 0-7695-2680-2/06 $20.00 © 2006

[IEEE 2006 Ninth Brazilian Symposium on Neural Networks (SBRN'06) - Ribeirao Preto, Brazil (2006.10.23-2006.10.23)] 2006 Ninth Brazilian Symposium on Neural Networks (SBRN'06) - Classification

  • Upload
    anselmo

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Classification of Normal, Benign and Malignant Tissues Using Co-occurrenceMatrix and Bayesian Neural Network in Mammographic Images

Leonardo de O. Martins1, Alcione M. dos Santos2, Aristofanes C. Silva1 and Anselmo C. Paiva1

1Universidade Federal do Maranhao - UFMA2Nucleo de Estatıstica e Informatica do Hospital Universitario - UFMA

Av. dos Portugueses, SN, Campus do Bacanga, BacangaCEP 65085-580 - Sao Luıs, MA, Brasil

[email protected], [email protected], [email protected], [email protected]

Abstract

This work analyzes the application of the co-occurrencematrix to the characterization of breast tissue as normal,benign or malignant in mammographic images. Themethod characterization is based on a process that selects,using forward selection technique, from all computedmeasures which best discriminate among normal, benignand malignant tissues. Then, a Bayesian neural networkis used to evaluate the ability of these features to predictthe classification for each tissue sample. To verify thisapplication we also describe tests that were carried outusing a set of 218 tissues samples, 68 benign and 51malignant and 99 normals. The result analysis has givenan accuracy of 86.84%, which means encouraging results.The preliminary results of this approach are very promisingin characterizing breast tissue.

1 Introduction

The breast cancer is the major cause of death bycancer in the female population. It is know that the bestprevention method is the precocious diagnosis, what lessensthe mortality and enhance the treatment [1]. Every threeminutes, a woman is diagnosed with breast cancer, andevery 13 minutes, a woman dies from the disease. It isalso estimated that one in eight women will be diagnosedwith breast cancer in her lifetime, and 1 in 30 will diefrom it [18]. Mammography is currently the best techniquefor reliable detection of early, non-palpable, potentiallycurable breast cancer [1]. The mortality rate from thisdisease decreased for the first time in 1995, due in part tothe increasing utilization of screening mammography [1].However, radiologists vary in their interpretation ofmammograms. In addition, the interpretation is a repetitive

task that requires much attention that make it easy tomisinterpretation. Therefore, in the past decade there hasbeen tremendous interest in the use of image processing andanalysis techniques for Computer Aided Diagnosis (CAD)from digital mammograms, which will replace conventionalmammograms in the near future. The goal has been toincrease diagnostic accuracy as well as the reproducibilityof mammographic interpretation.

CAD systems can aid radiologists by providing a secondopinion and may be used in the first stage of examinationin the near future, allowing the variability reduction amongradiologists interpretation of mammograms [14]. Forthat, it is important to develop many techniques to detectand recognize suspicious lesions and also to analyze anddiscriminate them. One of the methods reported for lesionsdiagnosis from mammograms are the neural networks.

Neural networks have been used as a final step forclassification, after feature selection. In [23], a neural-genetic algorithm for feature selection in conjunction withneural and statistical classifiers has obtained a classificationrate of 85.0% for a test set. Recently, metaheuristicshave been applied in identification of suspicious regionin mammograms [15], using the border points, enhancedby a genetic algorithm, and nipple position, identifiedby an ant colony system. The results obtained witha set of mammograms indicate that their method canimprove the sensitivity and reliability of the systems forautomated detection of breast tumors. A computer aidedneural network classification of regions of suspicion ondigitized mammograms is also presented in [5], in whicha Radial Basis Function Neural Network (RBFNN) is usedto complete the classification, fed by features selectedout by a new technique based on independent componentanalysis. Experiments in the MIAS Database have shown arecognition accuracy of 88.23% in the detection of all kindsof abnormalities and 79.31% in the task of distinguishing

Proceedings of the Ninth Brazilian Symposium on Neural Networks (SBRN'06)0-7695-2680-2/06 $20.00 © 2006

between benign and malignant regions, outperforming inboth cases standard textural features, widely used for cancerdetection in mammograms [5].

In this paper we intend to investigate the effectivityof a classification method done trough the use of aBayesian neural network with a set of tissues texturalfeatures, computed from the co-ocurrence matrix applied tomammographic images.

This work is organized as follows. In Section 2, wepresent the techniques for feature extraction and tissueclassification. Next, in Section 3, the results are shownand we discuss about the application of the techniquesunder study. Finally, Section 4 presents some concludingremarks.

2 Material and Methods

2.1 Image Acquisition

For the development and evaluation of the proposedsystem, we used the mini-MIAS [22] database. Thisdatabase contains left and right breast images for a totalof 161 patients with ages ranging from 50 to 65. Thespatial resolution of the image is 50µm × 50µm. Allimages are digitized at a resolution of 1024 × 1024 pixelsand at 8-bit grey scale level. All images also include thelocations of any abnormalities that may be present. Theexisting data in the collection consists of the location ofthe abnormality (like the center of a circle surrounding thetumor), its radius, breast position (left or right), type ofbreast tissues (fatty, fatty-glandular and dense) and tumortype if it exists (benign or malignant).

2.2 Bayesian Neural Networks

In this paper, we focus exclusively on feedforwardnetworks with a single hidden layer with m nodes and koutputs, besides do not allow direct connections from theinputs and outputs. The particular form of the neural modelwe will work is

fk(x,w) = φkwk0 +m∑j=1

wkjφj(wj0 +p∑

i=1

wijxi) (1)

where x is the input vector with p explanatory variables,xi and φ represents the activation function and the setof all weights (parameters), represented by the vector w,including input-hidden weights, biases and hidden-outputweights. If the neural model is used for classificationproblem, the output fk(x,w) is the final value used forclassification process.

In the Bayesian approach to learning neuralnetwork [4], [16], [17], the objective is to find theweights posterior distribution mode. To obtain the posteriordistribution of the weights, we need to specify the priordistribution, which is a probability distribution thatrepresents the prior information associated with the weightsof the network, and the data likelihood. Firstly, we willdiscuss how we choose the prior distribution of the weights.

Many implementations of Bayesian Neural Networks(BNN) use Gaussian distribution, with zero mean and somespecified width, as the priors for all weights and biases inthe network. To specify the prior distributions, the weightswere divided into three separate groups: bias terms, input tohidden weights and hidden to output weights.

We consider a Gaussian prior with zero mean andunknown variance 1/λα for the input to hidden weights,where λα is a precision parameter. Instead of fixing the λα

value, we regard it as another parameter. We would then callit a hiperparameter to separate it from weights and biases.Now, we need to specify a hyperprior distribution for λα.

Although there are several ways to implement therequired hyperprior, we choose a Gamma hyperprior, withmean and specified shape parameter [2]. This processcan be extended, where each input weight have differentpriors and hyperpriors. This process is called AutomaticRelevance Determination (ARD) [19]. Using this priordistribution, it is possible determine the relative importanceof the different inputs. The relevance of each input isconsidered to be inversely proportional to the variance ofthis distribution.

In order, the prior distribution for hidden to outputweights was also considered Gaussian with zero mean andunknown variance λβ . We use Gamma hyperprior withmean and specified shape to hiperparameter λβ .

Finally, all biases terms are then assumed to bedistributed according to a gaussian prior with mean zeroand variance λγ , where the Gamma hyperprior, with meanand specified shape, was again used to the hiperparameterηγ . To facilitate the notation, let’s denote a set ofhyperparameters ϕ = (λα, λβ , λγ).

Once we have chosen the prior distributions, we combinethe evidence from the data to get the posterior distributionfor the parameters and hyperparameters.

We assume that we have a data set consisting of ninput vectors xi, . . . ,xn and the corresponding target yi.For a multi-class classification problem, the targets arerepresented by a binary label vector in which a singlecomponent is set to one to denote the correct class and theothers component are set to zero.

Consider a network with one output fk(x,w) for eachclass. In this case, the outputs of the trained network areinterpreted as class probabilities, where it is common to usea generalization of the logistic sigmoid activation function.

Proceedings of the Ninth Brazilian Symposium on Neural Networks (SBRN'06)0-7695-2680-2/06 $20.00 © 2006

Therefore, the probability that a target yi has value j is

p(yi = j | x,w) =exp(fj(x,w))∑k exp(fj(x,w))

(2)

Assuming that (x1,y1), . . . , (xn,yn) are independentsand identically distributed according to multinomialdistribution, we have the following likelihood function forthe training data

P (D | w) ∝n∏

i=1

c∏k=1

fk(xi,w)yi (3)

After observing the data, using Bayes theorem andlikelihood, prior distribution is updated to the posteriordistribution

P (w, ϕ | D) =P (D | w, ϕ)P(w, ϕ)

P (D)

=P (D | w, ϕ)P(w | ϕ)P(ϕ)∫ ∫

P (D | w, ϕ)P(w, ϕ)dwdϕ(4)

The denominator in the Equation 4 is sometimescalled normalizing constant that ensures that the posteriordistribution integral is equal to one. This constant can beignored since it is irrelevant to the first level of inference.Therefore, the theorem may also be written as

P (w, ϕ | D) ∝ P (D | w, ϕ)P(w | ϕ)P(ϕ) (5)

Given a training data, to find the weight vector w∗,corresponding to the maximum of the posterior distribution,is equivalent to minimize the error-function E(w), which isgiven by

E(w) = − lnP(D | w) + lnP(w, ϕ) (6)

where P (D | w) is the likelihood function presented inEquation 3.

In the Bayesian learning, the posterior distribution isused to find the predictive distributions for the target valuesin the new test case given the inputs for that case as well asinputs and target for training cases [9]. To predict the newoutput yn+1 for new input xn+1, predictive distribution isobtained by integrating the predictions of the model withrespect to the posterior distribution of the model parameters

P (y(n+1) | x(n+1),D) =∫ ∫P (y(n+1) | x(n+1),w, ϕ)P(w, ϕ | D)dwdϕ

(7)

The posterior distribution presented in Equation 5 isvery complex and its evaluation requires high-dimensionalnumerical integration, then it is impossible to compute itexactly. In order, to make this integral analytically tractable,we need to introduce some simplifying approximations.

There are different approaches to calculate the posteriordistribution. In [17] is used a Gaussian approximationfor the posterior distribution, while in [19] is introduceda hybrid Monte Carlo method. Another approach toapproximate the posterior distribution uses Markov ChainMonte Carlo method (MCMC) [20]. For a review of thesemethods see, for instance [3].

The idea of MCMC is to draw a sample of values w(t),t = 1, . . . ,M from the posterior distribution of networkparameters. In this work, we used Gibbs sampling [8] togenerate samples to the posterior distribution.

Gibbs sampling is perhaps the simplest MCMC methodand it is applicable when the joint distribution is not knownexplicitly, but the full conditional distribution of eachparameter is known. In a single iteration, Gibbs samplinginvolves sampling one parameter from full conditionaldistribution given all other parameters.

Gibbs sampling requires that all conditional distributionsof the target distribution can be sampled exactly. Whenthe full conditional distribution was unknown, it was usedthe Metropolis-Hasting algorithm [12] or adaptive samplingprocedure [10]. For more details of this method, see [7].

We can observe that the integral of Equation 7 is theexpectation of function fk(x(n+1),w) with respect to theposterior distribution of the parameters. This expectationcan be approximated by MCMC, using a sample of valuesw(t) drawn from the posterior distribution of parameters.These values are then used to calculate

y(n+1) ≈1M

M∑t=1

fk(x(n+1),w(t)) (8)

2.3 Co-occurrence Matrix

The co-occurrence matrix or Spatial Gray LevelDependence Method – SGLDM is a texture analysistechnique that has been frequently used in 2D imagesegmentation and identification [13] and [11].

Co-occurrence matrix displays the graylevel spatial-dependency along different angular relationships,horizontal, vertical and two diagonal directions inimage. A co-occurrence matrix is specified by relativefrequencies P (i, j, r, Θ) with which two pixels, separatedby distance r, occur in a texture along the direction of angleΘ, one with graylevel i and the other with graylevel j. Aco-occurrence matrix is therefore a function of distance r,angle Θ and grayscales. It is assumed that a textured imageY is defined over a M = Nx ×Ny finite lattice Ω:

Ω = (k, l), (m,n) : 1 ≤ (k, l), (m,n) ≤ M (9)

The co-occurrence matrices used in this work show

Proceedings of the Ninth Brazilian Symposium on Neural Networks (SBRN'06)0-7695-2680-2/06 $20.00 © 2006

texture characteristics along 00, 450, 900 e 1350 as follow:

P (i, j, r, 00) = #((k, l), (m,n) ∈ Ω), |k −m| = 0,|l − n| = r, Y (k, l) = i, Y (m,n) = j

(10)

P (i, j, r, 450) = #((k, l), (m,n) ∈ Ω), |k −m| = r,|l − n| = −r)or((k −m) = −r, (l − n) = r), Y (k, l) = i,Y (m,n) = j

(11)

P (i, j, r, 900) = #((k, l), (m,n) ∈ Ω), |k −m| = r,|l − n| = 0, Y (k, l) = i, Y (m,n) = j

(12)

P (i, j, r, 1350) = #((k, l), (m,n) ∈ Ω), |k −m| = r,|l − n| = r)or((k −m) = −r, (l − n) = −r), Y (k, l) = i,Y (m,n) = j

(13)where # denotes the number of elements. It is observedthat the co-occurrence matrix is symmetrical because ofP (i, j, r, Θ) = P (j, i, r, Θ).

The total number of pairs of compared pixels is differentdue to the angular relationships. To overcome this, it isnecessary to normalise the frequencies of the occurrencematrices. The frequency normalising constant R is explicitydefined as the frequency of pairs of compared pixels in eachco-occurrence matrix.

Based on above co-occurrence matrix, many differenttexture are produced. Therefore, in this paper we usedjust 5 of the 13 measures proposed by Haralick et al. [11]to perform pattern recognition based on co-occurrencematrix P . The measures used in this work are: Contrast,Homogeneity, Inverse Difference Moment, Entropy andEnergy.

2.4 Selection of Most Significant Features

Our main objective is to identify the effectiveness ofa feature or a combination of features when applied to abayesian neural network. Thus, the choice of features to beextracted is important.

Forward selection is a method to find the “best”combination of features (variables) by starting with a singlefeature, and increasing the number of used features, step bystep [6]. In this approach, one adds features to the modelone at a time. At each step, each feature that is not alreadyin the model is tested for inclusion in the model. The mostsignificant of these feature is added to the model, so long asit’s p-value is below some pre-set level.

2.5 Evaluation of the classificationmethod

Sensitivity and specificity are the most widely usedstatistics to describe a diagnostic test. Sensitivity is theproportion of true positives that are correctly identified bythe test and is defined by Se = TP/(TP+FN). Specificity isthe proportion of true negatives that are correctly identifiedby the test and is defined by Sp = TN/(TN+FP). Where FNis false-negative, FP is false-positive, TN is true negativeand TP is true positive diagnosis.

3 Experimental Results

The regions of interest (ROI) were manually extractedfrom each image based on the information provided bythe MIAS database. From this database, we selected 119abnormal (68 benign and 51 malignant mammograms) and99 normal tissues samples from each group, summing up218 mammograms. To each mammogram, a ROI wasmanually selected, containing the lesion, in the case of thebenign and malignant mammograms, as shown in Figure 1.For the normal mammograms, the ROI was randomlyselected. Only the pectoral muscle was not considered asa possible ROI, although tissue and fatty tissue were. Ifthe tissues had different sizes, it was rescaled each ROI.Therefore, they were resized to 50x50 pixels.

Figure 1. Illustration of a ROI example ofmalignant tissue.

To compute the co-occurrence matrix we used fourdirections 0, 45, 90 e 135. For each direction wecompute the measures for the 1 pixel distance. Theseparameters were empirically defined, after exhaustive tests.We used the five measures defined in the Section 2.3. Thismethod had the number of gray levels equal to 8, 16, 32,64, 128 and 256. Using these parameters, the co-occurrencematrix generated 120 measures for each ROI (4 directions×1 distance× 5 measures× 6 number of quantization levels).

Proceedings of the Ninth Brazilian Symposium on Neural Networks (SBRN'06)0-7695-2680-2/06 $20.00 © 2006

To make feasible the computation we need to select fromall the obtained measures which were the minimum setthat has the power to discriminate benign from malignanttissues. To do it, we used the forward technique that reducedthe number of variables to 17. To reduce the multiplelinear correlation among the variables, we examine pair-wise correlations between variables, and remove one fromeach pair with high correlation (absolute correlation valueabove 0.9). Thus, the numbers of variables decreased to 8.This shows that for the used database these were the moreadequate characteristics to describe the tissues texture.

We generated a BNN with 8 inputs, a single-layerfeedforward with a fixed number m of hidden nodes andthree output nodes. Several values for m = 5, 6..., 18 wereused in the experiments, and we report just the best one, thatwas obtained with m = 12.

The nonlinear activation function used for the hiddenunits was the logistic sigmoid, which produces an outputbetween 0 and 1. For the output units, we used the logisticsigmoid function, but Equation 2 is also used because theseoutputs are probabilities and must sum to one.

We used the Gaussian prior distribution as described inSection 2.2, with three separate weight-groups. The priorover network parameters are

ui | λα ∼ N(0, λ−1α ), i = 1, . . . , 96.

vj | λβ ∼ N(0, λ−1β ), j = 1, . . . , 36.

bk | λγ ∼ N(0, λ−1γ ), k = 1, . . . , 15. (14)

where ui represents the input-hidden weights, vj thehidden-outputs weights and bk the biases terms.

A convenient form for the hyperprior distributions isvague Gamma distribution. Here, we considered allhiperparameters distributed according Gamma distributionwith the scale and shape parameter equal to 0.001. Thepriors for different parameters and hiperparameters are allindependent.

The software WinBUGS [21] was used to implementthe Bayesian neural network. Through WinBUGS, wespecified the model described in Section 2.2. Next, thesoftware simulated the posterior distribution values foreach parameter of interest, using the Metropolis-within-Gibbs procedure. We computed a single chain of aMCMC sampler in WinBUGS for each parameter of interest(weights and bias). We simulate 20000 iterations, anddiscarded the 10000 first in each sequence.

The posterior distribution samples for the modelparameters were used to estimate the predictive distributionfor the new test inputs. For each iteration t, the BNN hasparameters w(t) and produces an output y(t) , for an inputvector x . Thus, for each test sample, we calculate thearithmetic mean of the M network outputs, according toEquation 8. As we can see in Section 2.2, this value is agood aproximation of Equation 7.

The experiment was configured with 180 trainingsamples and 38 samples for tests. For the training, weobtained an accuracy of 100.0%. Table 1 shows the resultsobtained with the proposed method for the tests. Based onthe Table 1, we can see that the method obtained a meansuccess rate of 86.84% on discriminating malignant frombenign and normal tissues.

Tissue Benign Normal MalignantTotal 8 20 10

Correct 5 19 9Correct (%) 62.5 95.0 90.0

Table 1. Detailed accuracy analysis in thebreast tissue characterization.

To compute sensitivity and specificity more precisely,we divided results obtained in this multiclass classificationproblem in three binary classification problems. Thus, weexamined the efficiency of the proposed method in classifybetween benign vs. normal, malignant vs. normal andmalignant vs. benign tissues. Tables 2, 3 and 4 showthe grouped results. Examining Table 2, we can see thatthe method has a sensibility of 83.33% and a specificityof 95%, considering benign as “positive” and normal as“negative”. For Table 3, which considers only malignantand normal tissues, the method obtained an accuracy of100%. Finally, considering malignant as “positive” andbenign as “negative” the method obtained a sensibility of90% and a specificity of 71.42%.

Tissue Benign NormalBenign 5 1Normal 1 19

Correct (%) 83.33 95.0

Table 2. Detailed accuracy analysis in thebenign vs. normal t tissue characterization.

Tissue Malignant NormalMalignant 9 0

Normal 0 19Correct (%) 100.0 100.0

Table 3. Detailed accuracy analysisin the malignant vs. normal t tissuecharacterization.

The number of samples studied in our dataset is notsuficiently large enough to allow us to reach definitive

Proceedings of the Ninth Brazilian Symposium on Neural Networks (SBRN'06)0-7695-2680-2/06 $20.00 © 2006

Tissue Malignant BenignMalignant 9 2

Benign 1 5Correct (%) 90.0 71.42

Table 4. Detailed accuracy analysis in themalignant vs. benign tissue characterization.

conclusions, but preliminary results from this work arevery encouraging, demonstrating the potential of BNNuse, together with the co-ocurrence matrix features, formultiple variables classification and its effectively todiscriminate benign from malignant and normal tissue inmammography.

4 Conclusion

Based on these results, we have observed that suchmeasures provide significant support to a more detailedclinical investigation, and the results were very encouragingwhen tissue were classified with Co-occurrence matrix andBayesian Neural Networks. Nevertheless, there is the needto perform tests with a larger database and more complexcases in order to obtain a more precise behavior pattern.

Despite the good results obtained only by analyzing thegeometry, further information can be obtained by analyzingthe geometry. As a future work, we propose a combinationof texture and geometry measures for a more precise andreliable diagnosis. Also we need to compare the BNNresults with other neural networks to assert its effectively.

Acknowledgements

The authors acknowledge CAPES for its financialsupport to project (process number 0044/05-9) andscholarships.

References

[1] A. C. S. (AMS). Learn about breast cancer, 2006. Availableat http://www.cancer.org.

[2] J. O. Berger. Statistical Decision Theory and BayesianAnalysis. Springer-Verlag, New York, 2a edition, 1985.

[3] C. M. Bishop. Neural Networks for Pattern Recognition.Oxford University Press, New York, 2a edition, 1996.

[4] W. L. Buntine and A. S. Weigend. Bayesianbackpropagation. Complex Systems, 5:603–643, 1991.

[5] A. K. Christoyianni, E. Dermatas, and G. Kokkinakis.Computer aided diagnosis of breast cancer in digitizedmammograms. Computerized Medical Imaging andGraphics, 26:309–319, 2002.

[6] R. O. Duda and P. E. Hart. Pattern Classification and SceneAnalysis. Wiley-Interscience Publication, New York, 1973.

[7] D. Gamerman. Markov chain Monte Carlo: StochasticSimulation for Bayesian Inference. Chapman and Hall,London, 1997.

[8] S. Geman and D. Geman. Stochastic relaxation, gibbsdistributions, and the bayesian restoration of images. IEEETransactions on Pattern Analysis and Machine Intelligence,(6):721–741, 1984.

[9] M. Ghosh, T. Maiti, S. Chakraborty, and A. Tewari.Hierarchical bayesian neural networks: An application toprostate cancer study. Journal of the American StatisticalAssociation, 99(467):601–608, 2004.

[10] W. Gilks and P. Wild. Adaptive rejection sampling for Gibbssampling. Journal of the Royal Statistical Society Series C(Applied Statistics), (41):337–348, 1992.

[11] R. Haralick, K. Shanmugam, and I. Dinstein. Texturalfeatures for image classification. IEEE Transactionson Systems, Man and Cybernetics, SMC 3(6):610–621,November 1973.

[12] W. K. Hastings. Monte Carlo sampling methods usingMarkov chains and their applications. Biometrika, 1970.

[13] A. K. Jain. Fundamentals of Digital Image Processing.Prentice Hall, Englewood Cliffs, NJ, USA, 1989.

[14] Y. Jiang, R. M. Nishikawa, R. A. Schmidt, A. Y. Toledano,and K. Doi. Potential of computer-aided diagnosis to reducevariability in radiologists interpretations of mammogramsdepicting microcalcifications. Radiology, 220:787–794,2001.

[15] K.Thangavel and M.Karnan. Computer aided diagnosis indigital mammograms: Detection of microcalcifications bymeta heuristic algorithms. GVIP, 5, 2005.

[16] D. J. C. Mackay. Bayesian Methods for Adaptive Models.Tese de doutorado, California Institute of Technology, 1991.

[17] D. J. C. Mackay. A practical Bayesian framework forbackpropagation networks. Neural Computation, 4(3):448–472, 1992.

[18] N. C. I. (NCI). Cancer stat fact sheets:Cancer of the breast, 2006. Available athttp://seer.cancer.gov/statfacts/html/breast.html.

[19] R. M. Neal. Bayesian Learning for Neural Networks.Springer-Verlag, New York, 1996.

[20] D. Rios Insua and P. Muller. Feedforward neuralnetworks for nonparametric regression. In D. Dey,P. Muller, and D. Sinha, editors, Practical Nonparametricand Semiparametric Bayesian Statistics, pages 181–193.Springer-Verlag, New York, 1998.

[21] D. J. Spiegelhalter, A. Thomas, N. Best, and D. Lunn.WinBUGS manual version 1.4. Technical report, MRCBiostatistics Unit, Institute of Public Health, Robinson Way,Cambridge, 2003.

[22] J. Suckling, J. Parker, D. Dance, S. Astley, I. Hutt,C. Boggis, and et al. The mammographic images analysissociety digital mammogram database. Exerpta Medical,1069:375–378, 1994.

[23] P. Zhang, B. Verma, and K. Kumar. Neural vs. statisticalclassifier in conjunction with genetic algorithm basedfeature selection. Pattern Recognition Letters, 26:909–919,2005.

Proceedings of the Ninth Brazilian Symposium on Neural Networks (SBRN'06)0-7695-2680-2/06 $20.00 © 2006