Neural Network for Cancer Diagnosis

EAI 320 Practical 3 ReportNeural Networks

Supervised Learning in an Expert System

GJ Foucheu13004019

June, 2015

DECLARATION OF ORIGINALITY

UNIVERSITY OF PRETORIA

The University of Pretoria places great emphasis upon integrity and ethical conduct in thepreparation of all written work submitted for academic evaluation.

While academic staff teach you about referencing techniques and how to avoid plagiarism,you too have a responsibility in this regard. If you are at any stage uncertain as to what isrequired, you should speak to your lecturer before any written work is submitted.

You are guilty of plagiarism if you copy something from another author’s work (e.g. abook, an article or a website) without acknowledging the source and pass it off as your own.In effect you are stealing something that belongs to someone else. This is not only the casewhen you copy work word-for-word (verbatim), but also when you submit someone else’s workin a slightly altered form (paraphrase) or use a line of argument without acknowledging it. Youare not allowed to use work previously produced by another student. You are also not allowedto let anybody copy your work with the intention of passing if off as his/her work.

Students who commit plagiarism will not be given any credit for plagiarised work. Thematter may also be referred to the Disciplinary Committee (Students) for a ruling. Plagiarismis regarded as a serious contravention of the University’s rules and can lead to expulsion fromthe University.

The declaration which follows must accompany all written work submitted while you are astudent of the University of Pretoria. No written work will be accepted unless the declarationhas been completed and attached.

Full names of student:

Student number:

Topic of work:

Declaration

1. I understand what plagiarism is and am aware of the University’s policy in this regard.

2. I declare that this assignment report is my own original work. Where other people’s workhas been used (either from a printed source, Internet or any other source), this has beenproperly acknowledged and referenced in accordance with departmental requirements.

3. I have not used work previously produced by another student or any other person tohand in as my own.

4. I have not allowed, and will not allow, anyone to copy my work with the intention ofpassing it off as his or her own work.

SIGNATURE: DATE:

Contents

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Problem Definition and Methodology . . . . . . . . . . . . . . 3

1.2.1 Hidden Layer size and amount . . . . . . . . . . . . . . 31.2.2 Learning Rate . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Results and Graphs . . . . . . . . . . . . . . . . . . . . . . . . 41.3.1 Instability of using single neuron hidden layer . . . . . 41.3.2 Results for different hidden layer sizes . . . . . . . . . . 41.3.3 The effect of learning rate . . . . . . . . . . . . . . . . 5

1.4 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . 51.4.1 Effects of hidden layer size and the dataset . . . . . . . 51.4.2 The effect of learning rate . . . . . . . . . . . . . . . . 61.4.3 Final Properties and Stop Condition Notes . . . . . . . 6

1.5 Appendix A: Hidden Neurons . . . . . . . . . . . . . . . . . . 81.6 Appendix B: Learning Rate Results . . . . . . . . . . . . . . . 16

1.1 Introduction

Artificial Neural Networks (or ANN) are a topic of paramount importancein the scope of artificial intelligence and intelligent systems. In effect ANN’sattempt in some degree to computationally model the inner workings of bi-ological neural nets (like those contained in the brain.) They are in somedegree the most significant early attempt to model ”intelligence” in the field,especially in deep learning [1].The architecture of ANN’s generally consists of a network of parallel, multi-layered and interconnected neurons connect by inputs and outputs. Differentarchitectures exist however nothing but the most standard of neural networksis explored in this report. One characteristic that ANN’s share with theirbiological counterparts is the necessity for some kind of learning (or training)process in order to adapt them to the required dataset. Neural networks aregood for problems where pattern recognition and data interpolation in noisyspaces form part of the problem. If sufficiently trained they have the ability

2

to generalize beyond the data used for training.

This report will explore the fundamental aspects and problems of: supervisedlearning and neural-network topology. All within the context of a simple ex-pert system to diagnose malignant growths based on 9 characteristic inputs.

1.2 Problem Definition and Methodology

1.2.1 Hidden Layer size and amount

Number of hidden neurons

According to some sources, the optimum number of neurons in the hiddenlayer is somewhere in the range between the number of inputs to the networkand the number of outputs [2]. For this reason, this report will focus onanalysing networks with neurons from 2 to 9. For interest sake, the case for1 hidden neuron will also be tested.

Number of hidden layers

In this case, only a single hidden layer will be used. Although networks ofmore layers do exist, the potential generalization gain is minimal in mostcases. Since zero hidden layers are required to resolve linearly separabledata, and we don’t know whether or not this dataset is linear - we can onlyconclude by experiment exactly how many hidden layers are required. Thereis no well defined formula or process for choosing the number of hidden layersso generally starting with 1 layer and only adding subsequent if really neededwill have to do [3].

1.2.2 Learning Rate

For results, a relatively slow but visible learning rate (α) is about 0.01 [4],this rate will be used for initial observations with limited stop conditionsto determine optimal network size and then further learning rates will besimulated in neural networks of the best size(s) found.

3

1.3 Results and Graphs

1.3.1 Instability of using single neuron hidden layer

Sample Prediction Accuracy1 58 %2 96 %3 78 %

Mean 77.3 %

Figure 1.1: A table of samples for single hidden neuron.

1.3.2 Results for different hidden layer sizes

Body of Results in Appendix A combined to form the following graph.

Figure 1.2: A summary plot of measured results in simulations with differinghidden neuron counts. (see appendix a)

4

1.3.3 The effect of learning rate

The following plot is based on raw data plotted with results from Appendix B.

Figure 1.3: Plot of training rate vs alpha based on Appendix B results

1.4 Discussion and Conclusion

1.4.1 Effects of hidden layer size and the dataset

Findings in using less than 2 neurons

It is evident in the results that at first glance the accuracy of a single neuronhidden layer is surprising, it is only through more investigation that it be-comes evident that the average accuracy over many samples is actually verylow for a single neuron hidden layer. Due to this inconsistency between sam-ples and the fact that the only differing quality is weight initialization andtraining data sampling which are both random processes - it leads one to be-lieve that the effectiveness of the single neuron hidden layer is exceptionallysensitive to different training data and initial weights. This is the indicationof a poor network - This fact means that the Proben1 cancer dataset and its

5

resultant generalization curve is non-linear because of the inefficiency of alow neuron count network.

For more than 2 hidden neurons

In the data presented by Figure 1.2 it can be seen that between using 2and 4 neurons in the hidden layer the number of epochs required for thebest test error decreases, this suggests that more hidden layers means easierpattern recognition. When investigated, the plot data suggests the followingtrade-offs:

Number of Hidden Neurons Advantage Disadvantage2 - 4 Fast Train Time Poor Performance5 - 6 Good Performance Increased Train Times7 - 9 Good Performance Very Long Train Times

Figure 1.4: A table of trade-offs for small and large hidden layers

We can conclude then, that the best compromise is a hidden layer with 6 - 7neurons.

1.4.2 The effect of learning rate

Learning rate (α) has a profound influence on the training of the neuralnetwork. It has been shown that it directly controls the rate at which theneural network adapts to data, but the network is very prone to oscillationgiven a learning rate that is simply too high. It has been established that theideal learning rate for this network would sit somewhere on ∈ (0.01, 0.03).

1.4.3 Final Properties and Stop Condition Notes

Data suggests the following Neural Network properties:

1. Hidden Layers = 1

2. Hidden Neurons ∈ [6, 7]

3. Learning Rate α ∈ (0.01, 0.03)

Due to the oscillatory nature of validation error in the dataset, the stopcondition implemented detects positive gradients only after a certain desiredlevel of validation error has been reached. This seemed to work well.

6

Bibliography

[1] R. Collobert, “Deep learning for efficient discriminative parsing,” in In-ternational Conference on Artificial Intelligence and Statistics, no. EPFL-CONF-192374, 2011.

[2] J. Heaton, Introduction to neural networks with Java. Heaton Research,Inc., 2008.

[3] P. McCullagh, J. A. Nelder, and P. McCullagh, Generalized linear models,vol. 2. Chapman and Hall London, 1989.

[4] S. Russel and P. Norvig, Artificial Intelligence : A Modern Approach.Pearson, third ed., 2010.

7

1.5 Appendix A: Hidden Neurons

Figure 1.5: Results for nHidden = 2

8


9


10


11


12


13


14


15

1.6 Appendix B: Learning Rate Results

Figure 1.13: Results for alpha = 0.01

16


17


18


19


20

Documents

Neural Network for Cancer Diagnosis