Upload
jaime-fouche
View
20
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Practical Report for Neural Network based expert system cancer diagnosis.
Citation preview
EAI 320 Practical 3 ReportNeural Networks
Supervised Learning in an Expert System
GJ Foucheu13004019
June, 2015
DECLARATION OF ORIGINALITY
UNIVERSITY OF PRETORIA
The University of Pretoria places great emphasis upon integrity and ethical conduct in thepreparation of all written work submitted for academic evaluation.
While academic staff teach you about referencing techniques and how to avoid plagiarism,you too have a responsibility in this regard. If you are at any stage uncertain as to what isrequired, you should speak to your lecturer before any written work is submitted.
You are guilty of plagiarism if you copy something from another author’s work (e.g. abook, an article or a website) without acknowledging the source and pass it off as your own.In effect you are stealing something that belongs to someone else. This is not only the casewhen you copy work word-for-word (verbatim), but also when you submit someone else’s workin a slightly altered form (paraphrase) or use a line of argument without acknowledging it. Youare not allowed to use work previously produced by another student. You are also not allowedto let anybody copy your work with the intention of passing if off as his/her work.
Students who commit plagiarism will not be given any credit for plagiarised work. Thematter may also be referred to the Disciplinary Committee (Students) for a ruling. Plagiarismis regarded as a serious contravention of the University’s rules and can lead to expulsion fromthe University.
The declaration which follows must accompany all written work submitted while you are astudent of the University of Pretoria. No written work will be accepted unless the declarationhas been completed and attached.
Full names of student:
Student number:
Topic of work:
Declaration
1. I understand what plagiarism is and am aware of the University’s policy in this regard.
2. I declare that this assignment report is my own original work. Where other people’s workhas been used (either from a printed source, Internet or any other source), this has beenproperly acknowledged and referenced in accordance with departmental requirements.
3. I have not used work previously produced by another student or any other person tohand in as my own.
4. I have not allowed, and will not allow, anyone to copy my work with the intention ofpassing it off as his or her own work.
SIGNATURE: DATE:
Contents
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Problem Definition and Methodology . . . . . . . . . . . . . . 3
1.2.1 Hidden Layer size and amount . . . . . . . . . . . . . . 31.2.2 Learning Rate . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Results and Graphs . . . . . . . . . . . . . . . . . . . . . . . . 41.3.1 Instability of using single neuron hidden layer . . . . . 41.3.2 Results for different hidden layer sizes . . . . . . . . . . 41.3.3 The effect of learning rate . . . . . . . . . . . . . . . . 5
1.4 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . 51.4.1 Effects of hidden layer size and the dataset . . . . . . . 51.4.2 The effect of learning rate . . . . . . . . . . . . . . . . 61.4.3 Final Properties and Stop Condition Notes . . . . . . . 6
1.5 Appendix A: Hidden Neurons . . . . . . . . . . . . . . . . . . 81.6 Appendix B: Learning Rate Results . . . . . . . . . . . . . . . 16
1.1 Introduction
Artificial Neural Networks (or ANN) are a topic of paramount importancein the scope of artificial intelligence and intelligent systems. In effect ANN’sattempt in some degree to computationally model the inner workings of bi-ological neural nets (like those contained in the brain.) They are in somedegree the most significant early attempt to model ”intelligence” in the field,especially in deep learning [1].The architecture of ANN’s generally consists of a network of parallel, multi-layered and interconnected neurons connect by inputs and outputs. Differentarchitectures exist however nothing but the most standard of neural networksis explored in this report. One characteristic that ANN’s share with theirbiological counterparts is the necessity for some kind of learning (or training)process in order to adapt them to the required dataset. Neural networks aregood for problems where pattern recognition and data interpolation in noisyspaces form part of the problem. If sufficiently trained they have the ability
2
to generalize beyond the data used for training.
This report will explore the fundamental aspects and problems of: supervisedlearning and neural-network topology. All within the context of a simple ex-pert system to diagnose malignant growths based on 9 characteristic inputs.
1.2 Problem Definition and Methodology
1.2.1 Hidden Layer size and amount
Number of hidden neurons
According to some sources, the optimum number of neurons in the hiddenlayer is somewhere in the range between the number of inputs to the networkand the number of outputs [2]. For this reason, this report will focus onanalysing networks with neurons from 2 to 9. For interest sake, the case for1 hidden neuron will also be tested.
Number of hidden layers
In this case, only a single hidden layer will be used. Although networks ofmore layers do exist, the potential generalization gain is minimal in mostcases. Since zero hidden layers are required to resolve linearly separabledata, and we don’t know whether or not this dataset is linear - we can onlyconclude by experiment exactly how many hidden layers are required. Thereis no well defined formula or process for choosing the number of hidden layersso generally starting with 1 layer and only adding subsequent if really neededwill have to do [3].
1.2.2 Learning Rate
For results, a relatively slow but visible learning rate (α) is about 0.01 [4],this rate will be used for initial observations with limited stop conditionsto determine optimal network size and then further learning rates will besimulated in neural networks of the best size(s) found.
3
1.3 Results and Graphs
1.3.1 Instability of using single neuron hidden layer
Sample Prediction Accuracy1 58 %2 96 %3 78 %
Mean 77.3 %
Figure 1.1: A table of samples for single hidden neuron.
1.3.2 Results for different hidden layer sizes
Body of Results in Appendix A combined to form the following graph.
Figure 1.2: A summary plot of measured results in simulations with differinghidden neuron counts. (see appendix a)
4
1.3.3 The effect of learning rate
The following plot is based on raw data plotted with results from Appendix B.
Figure 1.3: Plot of training rate vs alpha based on Appendix B results
1.4 Discussion and Conclusion
1.4.1 Effects of hidden layer size and the dataset
Findings in using less than 2 neurons
It is evident in the results that at first glance the accuracy of a single neuronhidden layer is surprising, it is only through more investigation that it be-comes evident that the average accuracy over many samples is actually verylow for a single neuron hidden layer. Due to this inconsistency between sam-ples and the fact that the only differing quality is weight initialization andtraining data sampling which are both random processes - it leads one to be-lieve that the effectiveness of the single neuron hidden layer is exceptionallysensitive to different training data and initial weights. This is the indicationof a poor network - This fact means that the Proben1 cancer dataset and its
5
resultant generalization curve is non-linear because of the inefficiency of alow neuron count network.
For more than 2 hidden neurons
In the data presented by Figure 1.2 it can be seen that between using 2and 4 neurons in the hidden layer the number of epochs required for thebest test error decreases, this suggests that more hidden layers means easierpattern recognition. When investigated, the plot data suggests the followingtrade-offs:
Number of Hidden Neurons Advantage Disadvantage2 - 4 Fast Train Time Poor Performance5 - 6 Good Performance Increased Train Times7 - 9 Good Performance Very Long Train Times
Figure 1.4: A table of trade-offs for small and large hidden layers
We can conclude then, that the best compromise is a hidden layer with 6 - 7neurons.
1.4.2 The effect of learning rate
Learning rate (α) has a profound influence on the training of the neuralnetwork. It has been shown that it directly controls the rate at which theneural network adapts to data, but the network is very prone to oscillationgiven a learning rate that is simply too high. It has been established that theideal learning rate for this network would sit somewhere on ∈ (0.01, 0.03).
1.4.3 Final Properties and Stop Condition Notes
Data suggests the following Neural Network properties:
1. Hidden Layers = 1
2. Hidden Neurons ∈ [6, 7]
3. Learning Rate α ∈ (0.01, 0.03)
Due to the oscillatory nature of validation error in the dataset, the stopcondition implemented detects positive gradients only after a certain desiredlevel of validation error has been reached. This seemed to work well.
6
Bibliography
[1] R. Collobert, “Deep learning for efficient discriminative parsing,” in In-ternational Conference on Artificial Intelligence and Statistics, no. EPFL-CONF-192374, 2011.
[2] J. Heaton, Introduction to neural networks with Java. Heaton Research,Inc., 2008.
[3] P. McCullagh, J. A. Nelder, and P. McCullagh, Generalized linear models,vol. 2. Chapman and Hall London, 1989.
[4] S. Russel and P. Norvig, Artificial Intelligence : A Modern Approach.Pearson, third ed., 2010.
7
1.5 Appendix A: Hidden Neurons
Figure 1.5: Results for nHidden = 2
8
Figure 1.6: Results for nHidden = 3
9
Figure 1.7: Results for nHidden = 4
10
Figure 1.8: Results for nHidden = 5
11
Figure 1.9: Results for nHidden = 6
12
Figure 1.10: Results for nHidden = 7
13
Figure 1.11: Results for nHidden = 8
14
Figure 1.12: Results for nHidden = 9
15
1.6 Appendix B: Learning Rate Results
Figure 1.13: Results for alpha = 0.01
16
Figure 1.14: Results for alpha = 0.02
17
Figure 1.15: Results for alpha = 0.03
18
Figure 1.16: Results for alpha = 0.04
19
Figure 1.17: Results for alpha = 0.05
20