ZONE, ETHIOPIA
FASIKA, LACHORE
BAHIR DAR UNIVERSITY
SCHOOL OF RESEARCH AND POSTGRADUATE STUDIES
FACULTY OF COMPUTING
By
FACULTY OF COMPUTING
IN CASE OF WOLAYTA ZONE, ETHIOPIA.
By
Fasika Lachore Laba
a thesis submitted
to the school of Research and Graduate Studies of Bahir Dar
Institute of Technology, BDU in partial fulfillment of the
requirements for the degree
of Master of Science in the Software Engineering in the faculty of
computing.
Advisor Name: Mekuanint Agegnehu/Phd/
DECLARATION
I, the undersigned, declare that the thesis comprises my own work.
In compliance with
internationally accepted practices, I have acknowledged and
refereed all materials used
in this work. I understand that non-adherence to the principles of
academic honesty and
integrity, misrepresentation/ fabrication of any
idea/data/fact/source will constitute
sufficient ground for disciplinary action by the University and can
also evoke penal
action from the sources which have not been properly cited or
acknowledged.
Name of the student_______________________________ Signature
_____________
Date of submission: ________________
Place: Bahir Dar
This thesis has been submitted for examination with my approval as
a university
advisor.
School of Research and Graduate Studies
Faculty of Computing
THESIS APPROVAL SHEET
The following graduate faculty members certify that this student
has successfully presented
the necessary written final thesis and oral presentation for
partial fulfillment of the thesis
requirements for the Degree of Master of Science in Software
Engineering.
Approved By:
v
ACKNOWLEDGEMENTS
First of all, I would like to express my deepest gratitude to my
sponsors Mona L. Jordan,
Raymond Beck, Sr Sofie Op Mother general of the Dominican Sisters
of St Catherine of
Seina, Sr Annaliza Hipolito Op, Sr Cecilia Op, Sr Jennifer Abasolo
Op, and all sisters of St
Catherine of Seina in Bahir Dar for their assurance of helping me
until the completion my
study. The Start of this study would have been not possible unless
their continually support.
And especial thanks go to Mama Nancy and Liyod in America for
allowing me to have
privileged to get my new Laptop before the beginning of this
course, which was the most
important requirement to my entire study.
I would like also to express my sincere thankfulness to my
passionate and capacitate advisor
Mekuanint Agegnehu (Phd) for the nonstop start of supporting of my
MSc research. I would
be always grateful for his tolerance, motivation, enthusiasm, and
enormous knowledge he
has in the area of my study. The supervision he has been done is
very helpful through my
research.
My sincere thanks also go to the researchers Zerihun Yemataw/Phd/,
Mr Yasin Goa, Mr
Tesfaye Dejene, Mr Mikiyas Yeshitela, Mr Henok, Mr Worku, Mr Genene
and Mr Mesgana
for helping me providing the idea on Enset Yield Prediction model
and offering me the
chance to visit the Agricultural Research Center (AARC) that is the
center of excellence for
research on Enset nationally and providing the most important part
of this research, historical
data.
I extend my thanks and appreciations to Mr Abiyot and his coworkers
who have helped a lot
in preparation for the primary data for this research. Not only the
preparation of the data but
also providing the important materials to collect the primary data
which was held in field of
the national Enset research center in Areka-Wolaita. And also, it
is an honor to me to thank
Mr Yohannes Doboche, Fr Birhanu Lemma and Mr Minasie for allowing
me to have a
privileged access to very advanced Library and Internet in their
particular work place Dubbo
Catholic Primary School and Areka National Enset research
Center.
vi
My thankfulness goes also to my family members: my wife Monica
Haile and her families
in particular Birkie Haile, Solomon Seyoum and the children, my
parents Lachore Laba and
Theressa Yohannes, my aunt Sr Waje Yohannes, my big brother Tesfaye
and his wife
Woinshet Zemachu, and my little siblings Ashenaf L, Tinsae L, Lukas
L, Peteros L, Etenesh
L and Senait L for supporting me in many ways.
I am very thankful to H.E. Bishop Lesanuchristos Mathewos, Fr Groum
Tesfaye SJ, Fr
Abenet Abebe CM, Fr Michael Mungi SJ, Br Paul Klonzo SJ, and Br
Ayele Shalamo SJ the
two Jesuit scholars, Sr Trufatu Beshir and all my churchmates
brothers and sisters in Christ
for all your prayers, cares, guidance and being a model for my
every success in my life. Your
spiritual follows up and your life styles brought me to be today’s
Fasika.
Last but not the least, I would like to encompass my thankfulness
to my instructors, advisors,
faculty leadership, colleagues, classmates, friends and mentors for
all your inspiration,
insightful comments and remarks for identifying the study area for
my research and this
helped me in looking the area and identifying the research title
which is “ENSET YIELD
PREDICTION MODEL USING ARTIFICIAL NEURAL NETWORK: IN CASE OF
WOLAYTA, ETHIOPIA.
vii
ABSTRACT
Enset, Enset ventricosum, is a crop that contributes for
approximately 20% of the total
population in Ethiopia depends upon Enset for food security. The
yield prediction is a major
issue that remains to be solved based on available data. It is a
crucial function in planning
for food security of the population of a district or even of the
whole country.
Enset yields estimation models accounting the inter clonal, age
group, and harvesting time
differences to predict the different yield products (kocho in two
forms, bulla, amicho, and
fiber) of an Enset plant non-destructively are still lacking. As
Enset has five
yields/products, the already developed Enset yield estimation model
is having a limited use
in that only works for 'kocho' yield estimation. The prediction
models for 'bulla', 'amicho'
and fiber yield estimations are not yet developed. In this research
we used Artificial Neural
Network model for Enset yield prediction to predict the amounts of
different Enset yields
(kocho in two forms, bulla, amicho, and fiber) of an Enset plant.
The objective of this
research to design and develop an Enset Yield Prediction Model
Using Machine Learning
Algorithms: In case Of Wolayta, Ethiopia.
The Enset data was presented in the form of numerical and it was
collected from Areka
Agricultural Research center for the 7 years (2013 to 2019). An
exhaustive study is
performed on the given dataset and algorithms. The research
approach has five phases, data
gathering, data pre-processing, the prediction model is implemented
to predict yields, the
model is trained and finally, the model is evaluated.
We have built MLP-ANN, RF model and the Ensemble MLP-ANN. We have
evaluated the
performance of the models. We also compared the results based on
the errors generated.
The results of comparing the three models are: - For the model RF
we got with R2, MSE,
and RMSE 0.81, 0.176, and 0.419 respectively, and for the model
MLP-ANN we got with
R2, MSE, and RMSE 0.857, 0.14, and 0.374 respectively. And also for
newly proposed
model which is EMLP-ANN, we have evaluated for R2, MSE, and RMSE
0.92, 0.077, and
0.277 respectively.
viii
The study result show that using stacking ensemble method for
MLP-ANN enables to come
up with better prediction. This also can be improved by using the
combined approach of
the more advanced machine learning algorithms.
Keywords: Enset Yield Prediction, ANN, MLP Neural Network,
Ensemble-ANN,
Backpropagation, Enset ventricosum, Enset, KOCHO, BULLA, FIBER,
AMICHO
ix
1.5.1. General Objective
..............................................................................................
6
1.5.2. Specific objectives
.............................................................................................
6
1.7. Significance of the study
.........................................................................................
7
1.8. Thesis Organization
................................................................................................
7
2.1.1. Neuron (Node)
..................................................................................................
11
2.3. Artificial Neural Network in Agriculture
...........................................................
15
2.4. Crop Yield Prediction
...........................................................................................
15
2.5. Background of Enset (Enset e ventricosum (Welw.) Cheesman)
..................... 16
x
2.7. Enset (Enset e ventricosum) varieties in Wolaita Zone
..................................... 17
2.8. Health Benefits of Enset
........................................................................................
19
2.9. Approaches to this Study/Related works
............................................................
19
2.10. Summary of Related Works
.................................................................................
20
2.11. Gaps in Previous Study
.........................................................................................
21
2.12. Recommendation from previous Study
...............................................................
22
CHAPTER THREE
.........................................................................................................
22
3. METHODOLOGY
....................................................................................................
22
3.1.3. Designing and Development
............................................................................
23
3.1.4. Demonstration
..................................................................................................
24
3.1.5. Evaluation
........................................................................................................
24
3.1.6. Communication
................................................................................................
24
3.3. Data Analysis
.........................................................................................................
26
3.3.1. Data Collection
................................................................................................
26
3.3.2. Data Description
..............................................................................................
26
3.4. Data Preprocessing
................................................................................................
30
3.4.2. Data integration
.....................................................................................................
31
3.4.3. Data cleaning
..........................................................................................................
32
3.6. Prediction Model Skeleton
....................................................................................
35
3.7. Proposed Enset Yield Prediction Model
..............................................................
36
3.8. Ensemble Multilayer Perceptron Neural Network (EMPL-NN)
...................... 36
3.8.1. Input Layer
.......................................................................................................
37
3.11.1. Python 3.7
........................................................................................................
44
3.11.3. NumPy
.............................................................................................................
46
3.11.4. Anaconda
.........................................................................................................
46
3.11.5. Pandas
..............................................................................................................
47
3.11.6. Scikit-learn-Sklearn
.........................................................................................
47
3.11.7. TensorFlow
......................................................................................................
48
3.11.8. Keras
................................................................................................................
48
CHAPTER FOUR
............................................................................................................
50
4.1. Experimental Simulation for the Model
..............................................................
50
4.1.1. Training phase
..................................................................................................
51
4.1.2. Testing phase
....................................................................................................
52
ANN) 59
xii
4.3.2. R-squared (R2) as Model Evaluation Metrics
.................................................. 62
CHAPTER FIVE
..............................................................................................................
65
5.1. Conclusions
............................................................................................................
65
5.2. Contributions
.........................................................................................................
66
5.3. Recommendations
.................................................................................................
66
SNNPRS Southern Nations, Nationalities and Peoples Regional
State
GDP Gross Domestic Product
ANN Artificial Neural Network
BGCORM Corm weight before grating (kg)
BWT Bulla Weight (kg)
EMLP-ANN Ensemble Multilayer Perceptron Artificial Neural
Network
RF Random Forest
R2 R-Squared
Figure 2-2: Operations at one neuron in ANN
...................................................................
11
Figure 2-3: Multilayer Artificial Neural Network(Patterson &
Gibson, 2017.) ................. 13
Figure 2-4: Flow Chart for Back Propagation Algorithm (Kim &
Seo, 2018) .................. 14
Figure 3-1: Data Preprocessing Stages(Kung et al., 2016.)
............................................... 31
Figure 3-2: Tabular representation of Dataset
....................................................................
34
Figure 3-3: Visualized Input dataset
...................................................................................
34
Figure 3-4: Visualized output dataset
.................................................................................
35
Figure 3-5: Skeleton for Yield Prediction Model for Enset
............................................... 35
Figure 3-6: MLP-ANN Architecture
..................................................................................
38
Figure 3-7: Proposed Ensemble MLP-ANN Model
...........................................................
39
Figure 3-8: Learning Process Artificial Neural Network
................................................... 42
Figure 3-9: Process flow diagram
.......................................................................................
42
Figure 3-10: Spyder python development environment
..................................................... 45
Figure 4-1: Optimization of Backpropagation algorithm
................................................... 56
Figure 4-2: Training and validating loss of MLP-ANN model
.......................................... 58
Figure 4-3: Graphical view of losses of MLP-ANN Model
............................................... 59
Figure 4-4: System generated Architecture for Ensemble MLP-ANN
Model ................... 59
Figure 4-5: Graphical view of losses of EMLP-ANN Model
............................................ 60
Figure 4-6: Graphical view of losses of EMLP-ANN Model
............................................ 61
xv
Table 3-1: Quantitative parameters of the Enset data
......................................................... 28
Table 3-2: Input Parameters of the Enset
...........................................................................
30
Table 3-3: Output Parameters of the Enset
.........................................................................
30
Table 3-4: List of Material used to collect primary data
.................................................... 44
Table 4-1: Evaluation of Enset Yields prediction models
.................................................. 63
Table 4-2: Evaluation of models terms of individual yield
................................................ 64
CHAPTER ONE
1. INTRODUCTION
1.1. Background
Agriculture is the main source of national income for most
developing countries(Mohan & Patil,
2017b). Agriculture in Ethiopia is the largest component of its
economy and employs majority
of the Ethiopian population. The majority of these are smallholder
farmers practicing
subsistence farming on less than one hectare of land. Ethiopian
agriculture is rainfall dependent
and subsistence-oriented. The agriculture in turn depends on
unpredictable and erratic rainfall
and is basically subsistent in its nature.
Agricultural system is very complex since it deals with large data
situation which comes from
a number of factors. A lot of techniques and approaches have been
used to identify any
interactions between factors that affecting yields with the crop
performances. The application
of neural network to the task of solving non-linear and complex
systems is promising(Bejo &
Mustaffha, 2014).
Agriculture is the livelihood for more than 90% of the population
in the rural areas. Enset e is
an essential element in Wolayita food economy and acts as a staple,
or co-staple, food. Where
land is very scarce and consequently where cereal harvests are low,
high yielding Enset offers
some opportunity for food security. Enset is also popular because
of its drought resistant
properties(Zengele, 2017).
Agricultural management need simple and accurate estimation
techniques to predict yields in
the planning process(S. S. Dahikar, Extc, & College, 2015).
Most farmers are relied on their
long-terms experiences in the field on particular crops to expect a
higher yield in the next
harvesting period. Also listed two important steps. First was by
using traditional approach of
mathematical models and the second was on the application of
artificial intelligent for the
prediction.
for Ethiopia that ensures year-round food and feed security,
traditional medicine and fiber
2
(Brandt & Mccabe, 1997). The Enset cultivation system is
economically viable and well adapted
to Ethiopian agricultural systems. Every part of the plant can be
used in one way or another.
Farmers often acknowledge that Enset is their food, cloth, house,
bed, cattle feed and
plate(Tsegaye & Struik, 2003).
Enset (Ensete ventricosum) as it is commonly known as the Ethiopian
banana, Abyssinian
banana, false banana, Enset or Enset e, is an herbaceous species of
flowering plant in the
banana family Musaceae. Enset (Enset e ventricosum) is the main
crop of sustainable and
indigenous cropping system in Ethiopia that ensures food security
for several millions of people
(Yesuf & Hunduma, 2012a). Enset, Enset e ventricosum, is a crop
that contributes for
approximately 20% of the total population in Ethiopia depends upon
Enset for food security.
Moreover, different parts Enset are also widely used as feed, fiber
and construction material
(Yesuf & Hunduma, 2012a).
According to (Ayalew & Yeshitila, 2011) Enset has three major
products utilized as food are
commonly known as Kocho, Bulla and Amicho. Kocho is a fermented
product from the
scrapped parenchymatic tissue of leaf sheath and pulverized corm.
Bulla is made by dehydrating
the juice arising from the mixture of scrapped parenchymatic tissue
of leaf sheaths, pulverized
corm and granted stalk of inflorescence. Amicho is the stripped
corm of younger plants of Enset
which is boiled and consumed. Apart from its multipurpose use the
Enset plant has cultural and
socioeconomic value mainly in the south and south-west parts of
Ethiopia.
Yield prediction is a very important issue in agricultural. Any
farmer is interested in knowing how
much yield he is about to expect. In the past, yield prediction was
performed by considering
farmer's experience on particular field and crop. The yield
prediction is a major issue that remains
to be solved based on available data(Manjula, 2017).
Assessment of the usable yield of Enset , however, is difficult due
to complicated production
methods and processing procedures. Enset is a perennial and the
vegetative propagated
planting material is yearly transplanted into several nurseries
until finally it is planted in a part
of the field where it matures until harvest.(Tsegaye & Struik,
2001a).
3
According to (Zerihun et al., 2016), Areka Agricultural Research
Centre, Ethiopia which hosts
the coordination of the National Enset Improvement Program and is
situated in the heart of one
of the major Enset producing areas of the country
Agriculture, as the backbone of many developing economies
(especially in Ethiopia), provides
a substantial portion of their Gross Domestic Product
(GDP)(Manjula, 2017). Thus, the
possibility to obtain yield predicts with reasonable accuracy prior
to harvest is important, since
timely interventions can take place in case low yields are
predicted.
Better predictions can be achieved through models by considering
the factors that affect crop
growth and yield for a year of interest. Accurate information about
history of crop yield is an
important thing for making decisions related to agricultural risk
management. This research
focuses on evolution of a prediction model which may be used to
predict Enset yield production.
Therefore, Crop yield prediction is an important agricultural
problem.
There are works done to predict some Enset yields like kocho using
regression models.
According to (Bejo & Mustaffha, 2014), the combination of
advance technology and agriculture to
improve the production of crop yield is becoming more interesting
recently. Added by (Bejo &
Mustaffha, 2014), ANN has become a well-liked method to most
authors because of its ability of
prediction, forecasting and classification in biological science
fields.
Several previous researchers like (B. & Louella, 2018)
developed Bitter Melon Crop Yield
Prediction using Machine Learning Algorithm, (Kung, Kuo, et
al.,2016) developed Accuracy
Analysis Mechanism for Agriculture Data Using the Ensemble Neural
Network Method,
(Mohan & Patil, 2017a) designed a model for Crop Cost
Forecasting using Artificial Neural
Network with feed forward back propagation method. And others like
(Ramesh & Vishnu,
2015), (Manjula, 2017), (Prasad, Chai, Singh, & Kafatos, 2006),
(Sahle, Yeshitela, & Saito,
2018), and (Haile, 2014b) used machine learning and data mining
algorithms for designing of
crop prediction model and analysis.
4
1.2. Research Motivation
The motivation behind this study was that the study area
particularly AARC. This research
center is hosting national research in Enset. The researcher was
coming from this area. The
farmers’ life almost defend on the products of the Enset. A crop
yield prediction is a general
problem that occurs. Farmers have curiosity in knowing how much
yield they are about to
expect, though the Enset yield estimation models are not yet ready.
The researcher motivated
to come up with designed model for Enset yield prediction using
machine learning algorithms.
1.3. Problem Statement
The need values of the Crop yield prediction model have been
mentioned by (Menaka, 2017)
is to improve crop marketing and planning, improve crop field-level
investment, improve
production planning, improve crop production input, improve crop
field operation and
mitigate negative soil impact. However, very numerous crops are
being cultivating in
Ethiopia, Enset () is one of among, and the yield prediction model
is not predictive enough.
According to (Yesuf & Hunduma, 2012a) the attempts were also
made to develop regression
model which, non-destructively, predicts yield of Enset with better
precision and simplifying
yield evaluation in experiments and solve difficulties in
estimating kocho yield in the
assessment of production balance in Enset production region of the
country. But the yield of a
Enset also has a non-linear relationship with critical input
parameters which are not considered
in regression model. Hence, in these study non-linear models like
Artificial Neural Network
(ANN) is used for predicting Enset yields more accurately than
regression model.
Several researchers have developed models to estimate the yield of
Enset in Ethiopia. In order
to reduce confusion around the yield and production estimate,
(Tsegaye & Struik, 2001b)
developed a linear model for predicting Enset plant yield and
assessing kocho production in
Ethiopia. At the Areka Agricultural Research Center in Southern
Ethiopia, (Tsegaye & Struik,
2003) investigated the kocho yield of Enset in terms of weight and
energy under different
crop establishment methods. By considering different clones and
using multiple regression
models, (Haile, 2014b)further developed simple linear models and
investigated fermented
5
unsqueezed kocho as a function of Enset plant height and pseudostem
circumference
measurements(Sahle et al., 2018)
According to (Struik, 2003) Yield data on Enset are very scarce.
Also (Struik, 2003) added
that, there is also a lack of knowledge on the physiological
parameters of Enset that determine
the growth of the crop and how these parameters develop and affect
growth under field
conditions in which others factors are very variable.
The study done by (Yesuf & Hunduma, 2012a) suggested as that
the Enset yields estimation
models accounting the inter clonal, age group, agro-ecological, and
harvesting time differences
to predict the different yield products (kocho, bulla, amicho, and
fiber) of an Enset plant non-
destructively are still lacking.
The already developed Enset yield estimation model is having a
limited use in that only works
for 'kocho' yield estimation; models for 'bulla', 'amicho' and
fiber yield estimations are not yet
developed. As (Yuvaraj, 2016) stated, one of the difficulties faced
in the prediction process is
that most of the essential parameters that are necessary to
consider for the accurate prediction
are not consider.
In Summary, in this research we used Artificial Neural Network
model for Enset yield
prediction in accounting the inter clonal, age group, and
harvesting time differences is
developed to predict the different yield products (kocho, bulla,
amicho, and fiber) of an Enset
plant.
1.4. Research Questions
In our study, the more specifically, the following research
questions need to be addressed. We
listed five research questions bellow to be answered in our entire
study. The following research
questions are formulated and addressed in this research.
1. What are the parameters necessarily used to build Enset Yield
Prediction model using
machine learning algorithms?
2. What are the appropriate methods and techniques for Enset data
processing and
structuring?
6
3. How to integrate neural networks to improve the Enset Yield
prediction model
performance on learning problems?
4. What is the performance of the newly developed Enset Yield
rediction model?
1.5. Objective of the study
In this section, the general and specific objective of this
research is described.
1.5.1. General Objective
The general objective of the research is to design and develop an
Enset Yield Prediction Model
Using Machine Learning Algorithms to predict Enset Yields (KOCHO in
two forms, BULLA,
CORM (AMICHO) and FIBER the by-product) using vegetative parameters
of the Enset plant.
1.5.2. Specific objectives
In order to achieve the overall stated objective of this research
work, we have formulated the
following specific objectives:
To explore the existing yield prediction models for Enset
plant.
To identify the vegetative determinant factors of Enset
yields.
To apply machine learning algorithms for Enset data analysis and
structure.
To design and develop Enset Yields prediction model.
To evaluate performance of the newly developed model.
1.6. Scope of the study and Limitation
The main scope of the study is to develop an Enset yield prediction
model using machine
learning algorithms. The data for this study is collected from
Wolaita zone Areka Agicultural
Research Center. The study is limited to designing Enset Yield
prediction model that could be
used as a baseline for implementation. The researchers are not gone
further to implement the
proposed Enset yield prediction model due to the limited amount of
time and financial
resources.
7
The significance of the proposed study provides the following
facts.
This research helps farmers and growers in making planting
decisions, setting
appropriate food reserve level and improving risk management of
crop-related
derivatives.
Saves waiting times of the farmers to know the estimation of the
products as Enset takes
much time to know the amounts of the products after its after
harvesting.
The study contributes increasing the income of the farmers in
predicting the amount of
the yields prior to collection of the yields.
It also helps farmers and agriculture sector in addressing food
security challenges and
planning for the next planting.
The study could contributes to the research center in Areka to
understand yield
prediction of Enset in consideration of inter clonal, age group,
and harvesting time
differences.
It could help in also maximizing the Enset yield, selection of the
appropriate Enset that
would be planted plays a vital role.
The output of this research work could contribute to other future
scholars who want to
do their research on the same areas and act as a base to do further
improvement on the
model or the techniques this study has been used.
1.8. Thesis Organization
The remaining part of the thesis is prepared as follows.
Chapter two presents literature review on definition of Artificial
Neural Networks, ANN
Application in Agriculture, Crop yield predictions and analysis,
different Enset benefits and
varieties, recommendations from previous studies and gaps
identified from the previous studies
Chapter three presents the methodologies that we have used to
accomplish this thesis is
discussed. It includes data collection, data preprocessing,
proposed model design and prediction
methods. Chapter four discusses the results of the designed model.
Results of the experiment
8
are also analyzed and interpreted. Chapter five summarizes the
Conclusion and
recommendations of the study for future work.
9
2. LITERATURE REVIEW
In our research, this chapter includes a brief overview of the
Artificial Neural Network and its
application in agriculture. The different algorithms and the
approaches used to design network
model to predict yields of Enset are reviewed and presented and
others very recent related works
which are previously done included.
2.1. Overview of the Artificial Neural Network
Artificial Neural Networks (ANN) is the machine learning model that
tries to solve problems
in the same way as the human brain does. Instead of neurons, ANN is
using artificial neurons,
also known as perceptron. In the human brain, neurons relate to
axons, while in the ANN,
weighted matrices are used for connections between artificial
neurons. Information travels
through neurons using connections between them; from one neuron,
the information travels to
all neurons connected to it(Jukic, Saracevic, & Subasi, 2020).
Adjusting the weights between a
neurons system can be trained from input examples.
Artificial Neural Network (ANN) technology is a group of computers
designed algorithms for
simulating neurological processing to process information and
produce outcomes like the
thinking process of humans in learning, decision making and solving
problems. The uniqueness
of ANN is its ability to deliver desirable results even with the
help of incomplete or historical
data results without a need for structured experimental design by
modeling and pattern
recognition. It imbibes data through repetition with suitable
learning models, similarly to
humans, without actual programming.
It leverages its ability by processing elements connected with the
user given inputs which
transfers as a function and provides as output. Moreover, the
present output by ANN is a
combinational effect of data collected from previous inputs and the
current responsiveness of
the system (Ankith & Damodharan, 2018).
Artificial Neural Networks, which are nonlinear data-driven
approaches as opposed to the above
model-based nonlinear methods, are capable of performing nonlinear
modeling without a priori
10
knowledge about the relationships between input and output
variables. Thus they are a more
general and flexible modeling tool for forecasting. The idea of
using ANNs for forecasting is
not new ability to tackle complex calculation issues; they are
progressively applied to solve
practical problems.
The main advantage of ANNs is the fact that task solving is done by
putting forward input
signals stimulating network capability to learn and recognize
patterns. Sometimes ANN is
preferred over complex algorithms or rule-based programming for
solving various tasks. As
defined by (Khatib, 2011), Artificial Neural Network (ANN) is a
Mathematical model designed
to train, visualize, and validate neural network models. It has
been conducted right after the
recognition of the way the human brain computes. Also (Khatib,
2011) added, ANN resembles
the brain in two respects:
1. Knowledge is acquired by the network from its environment
through a learning process.
2. Interneuron connection strengths, known as synaptic weights, are
used to store the acquired
knowledge.
Artificial neural networks (ANNs) are biologically inspired
computer programs designed to
simulate the way in which the human brain processes information.
ANNs gather their
knowledge by detecting the patterns and relationships in data and
learn (or are trained) through
experience, not from programming. An ANN is formed from hundreds of
single units, artificial
neurons or processing elements (PE), connected with coefficients
(weights), which constitute
the neural structure and are organized in layers. The power of
neural computations comes from
connecting neurons in a network.
Figure 2-1: Structure of artificial neuron
11
Artificial Neural Network (ANN) is the network of artificial
neurons. It is based on the human
brain’s biological processes(Mishra, Mishra, & Santra, 2016).
The benefits of using Neural
Network models are the simplicity of application and robustness in
results and NN models have
developed into a powerful approach that can approximate any
nonlinear input-output mapping
function(Safa, Samarasinghe, & Nejat, 2020).
The types of artificial neural networks depend on architecture,
neuron activation function, loops
in architecture, learning algorithm, and other attributes. Also,
there are types of artificial neural
networks that are capable of learning without human
interaction(Jukic et al., 2020). Due to its
documented ability to model any function, MLP trained with BP is
selected to develop apparatus,
processes, and product prediction models(Coast & Safa,
2015).
2.1.1. Neuron (Node)
It is the basic unit of a neural network. It gets certain number of
inputs and a bias value. When a
signal (value) arrives, it gets multiplied by a weight value. In
this research we have 11 inputs; it
has 11 weight values which can be adjusted during training time
plus bias. The ANN consists of
a very simple and highly inter-connected processor called a neuron.
A neuron is an information-
processing unit that is fundamental to the operation of a neural
network, and consists of a weight
and an activation function(Kim & Seo, 2018).
(2.1)
12
2.1.2. Connections
It connects one neuron in one layer to another neuron in other
layer or the same layer. A
connection always has a weight value associated with it. Goal of
the training is to update this
weight value to decrease the loss (error). The weight parameters on
the links between neurons
are determined by the training algorithm. The weights are the most
important parameters acting
as the memory of ANN, and the activation function provides
nonlinear mapping potential with
the network(Kim & Seo, 2018).
2.1.3. Bias (Offset)
It is an extra input to neurons, and it is always 1, and has its
own connection weight. This makes
sure that even when all the inputs are none (all 0’s) there’s gonna
be an activation in the neuron.
2.1.4. Activation Function (Transfer Function)
Activation functions are used to introduce non-linearity to neural
networks. It squashes the
values in a smaller range.
The artificial neural network is organized into multiple layers,
where each layer contains
multiple neurons. The information inside of the network travels
from input layers to the output
layers. Between input and output layers, the artificial neural
network can have zero or more
hidden layers. The number of layers and number of neurons inside
the artificial neural network
is called the architecture of the neural network(Jukic et al.,
2020)
Typically, a minimum of three layers which are the input layer, the
hidden layer and the output
layer is required to develop an ANN system(Bejo & Mustaffha,
2014). A network’s
architecture can be defined by: number of neurons, number of layers
and types of
connections between layers.
Figure 2-3: Multilayer Artificial Neural Network(Patterson &
Gibson, 2017.)
The above figure 3, shows feed forward artificial neural network;
it is also known as
multilayer perceptron. It has inputs from the external world. It
consists input layer, hidden
layers and output layer. There is also output which is out from the
output layer. The
function of the input layer to send signal for the hidden layers.
The hidden layers will do
computational analysis and send the result to the output
layer.
ANN consists computing devices called neurons that are connected to
each other in a complex
communication network, through which the brain is able to carry out
highly complex
computations. Multilayer perceptron uses variety of learning
techniques.
In the artificial neural network, the smallest building block is
the perceptron that has multiple
weighted inputs, bias input, and the activation function.
Propagating signal from input to the
output in the artificial neural network is called forward
propagation, while propagating signal
from output to input is called back propagation.
14
The most popular algorithm for training artificial neural networks
is called the backpropagation
algorithm(Jukic et al., 2020)
Backpropagation uses gradient descent on the weights of the
connections to minimize the error
on the output of the network. It is the foremost known and
easy-to-understand(Patterson &
Gibson, 2017.) Each layer can consist a different number of neurons
and each layer is fully
connected to the next layer.
The performance of the prediction will be determined by the correct
values of the weights
and biases. The method of fine-tuning the weights and biases from
the input data is known
as training the Neural Network. In each iterations of the training
process will have the following
steps;
Calculating the predicted output , known as feedforward
Updating the weights and biases, known as backpropagation.
Figure 2-4: Flow Chart for Back Propagation Algorithm (Kim &
Seo, 2018)
15
2.2. Ensemble Artificial Neural Network
Ensemble machine learning techniques are algorithms that combine
the outputs of multiple
learners to achieve better performance(Jukic et al., 2020).
Ensemble learning is a machine
learning paradigm where multiple models (learners) are trained to
solve the same problem. By
using multiple learners, generalization ability of an ensemble can
be much better than single
learner. Ensemble learning algorithms are meta-algorithms that
combine several machine
learning algorithms into one predictive model in order to decrease
variance, bias or improve
predictions.
2.3. Artificial Neural Network in Agriculture
The purpose of this topic is to know the current state of the
applications of the Artificial Neural
Network in agriculture area. Agricultural production in Ethiopia is
characterized by subsistence
orientation, low productivity, low level of technology and inputs,
lack of infrastructures and
market institutions, and extremely vulnerable to rainfall
variability(Bhanose, Bogawar, Dhotre,
& Gaidhani, 2016). Added by (Bhanose et al., 2016),
Agricultural production is dominated by
smallholder households which produce more than 90% of agricultural
output and cultivate more
than 90% of the total cropped land.
2.4. Crop Yield Prediction
A crop prediction is a huge problem that occurs. A farmer had an
attention in understanding
how much produce he is going to expect. Traditionally farmers
decide this based on permanent
experience for specific yield, plants and weather conditions.
Character directly thinks about
produce prediction rather than concerning on crop prediction
(Bhanose et al., 2016). By
extending his study (Manjula, 2017) defined also yield prediction
is an important agricultural
problem. Each and Every farmer is always trying to know, how much
yield will get from his
expectation.
According to (Manjula, 2017) the Agricultural yield is primarily
depends on weather conditions,
pests and planning of harvest operation. Accurate information about
history of crop yield is an
important thing for making decisions related to agricultural risk
management(Manjula, 2017).
16
Crop yield determination is a crucial function in planning for food
security of the population of
a district or even of the whole country. Agriculture, as the
backbone of many developing
economies like Ethiopia, provides a substantial portion of their
Gross Domestic Product (GDP).
Thus, the possibility to obtain yield estimates with reasonable
accuracy prior to harvest is
important, since timely interventions can take place in case low
yields are predicted.
2.5. Background of Enset (Enset e ventricosum (Welw.)
Cheesman)
Enset ventricosum (Welw.) Cheesman is a major food security crop in
Southern Ethiopia,
where it was originally domesticated(Zengele, 2017). As stated by
(Ayalew & Yeshitila,
2011), the three major products utilized as food are commonly known
as Kocho, Bulla and
Amicho. Kocho is a fermented product from the scrapped
parenchymatic tissue of leaf sheath
and pulverized corm. Bulla is made by dehydrating the juice arising
from the mixture of
scrapped parenchymatic tissue of leaf sheaths, pulverized corm and
granted stalk of
inflorescence. Amicho is the stripped corm of younger plants of
Enset which is boiled and
consumed in a way like Irish potato, sweet potato and
cassava(Ayalew & Yeshitila, 2011).
In the Southern Nations Nationality and Peoples Regional State
(SNNPRS), the 1994 estimates
show that 300,000 hectares of Enset is projected to yield almost 10
tons per hectare. Enset
planting economy is one of the major activities of the agriculture
in SNNPRS. The area contains
over 80% of Enset production of the country (Zengele, 2017).
Inclusion of Enset production
from Oromiya Region (Oromia) and the national root crop production
would have placed
estimated Enset and root crop production' at more than 1/4 of the
total cereal and pulse
production of Ethiopia. This would have created, on paper, a food
surplus situation in Ethiopia
that could endanger the food security of numerous communities,
which are in fact food deficit
areas (Zengele, 2017).
2.6. Enset and Wolaita Background
Wolaita Zone is one of the major parts of Enset production area in
the SNNPRs, Ethiopia. Based
on the 2007 Census conducted by the Central Statistical Agency of
Ethiopia (CSA), this Zone
has a total population of 2,473,190; with an area of 4,208.64
square kilometers, Wolayita has a
17
population density of 356.67. While 172,514 or 11.49% are urban
inhabitants, a further 1,196
or 0.08% are pastoralists. Wolaita, 'the people of Enset
culture'.
Geographically, Wolaita Zone is located between 7° 00' North
latitude and 37° 45' East
longitude at the edge of the East African Great Rift Valley.
Inhabitants of the Wolaita Zone are
primarily the Wolaita ethno-linguistic communities speaking the
Omotic Wolaita language,
Wolaitato Donaa. The Wolaita are predominantly agriculturalists,
practicing mixed crop-
livestock production and living in permanent settlements. Within
their landholdings,
community members maintain fruit orchards, nurseries, medicinal
plants, vegetables, root and
tuber crops, ornamentals, spices, as well as open areas for raising
domestic animals(S. Dahikar
& Rode, 2014).
The Wolaita are people whose agriculture is based on Enset ,
locally known as uutta. The
Wolaita is regarded as 'the Enset people' or 'the people of Enset
culture' for the strong interlink
that exist between Enset cultivation and the local food and
material culture of the people
(Zengele, 2017). Added by (Zengele, 2017)as indicated in Regional
Statistical Abstract, the area
coverage of Enset production in the Zone is 5,400 hectares. The
estimated annual production
is 2, 032,656 quintal.
2.7. Enset (Enset e ventricosum) varieties in Wolaita Zone
Enset cultivation is the centre of the cropping system in which the
entire farming system is
based and the crop is the major food security and livelihood source
in the Wolaita
community(Zengele, 2017). The study done by (Haile, 2014a)
discovered different Enset
(Enset ventricosum) vernaculars/clones are identified by the
farmers in the study area and have
their own names that are uniformly spread across the study zone.
Enset clones are very diverse
in the area ranging from 2 to more than 50 clones. Each farmer
possessed various number of
Enset varieties in his farm.
Farmers give vernacular names for each clone. They differentiated
one from the other
phenotypically by looking the color (as dark green, light green,
brown, light brown, red, pinkish,
etc.) of petiole, mid-rib, leaf sheath, angle of leaf orientation,
size and color of leaves and
circumference & length of pseudostem (as tall, medium, short,
very short, etc.). Almost all the
18
farmers in the area produce many Enset clones in mixtures that are
used for different
purposes(Haile, 2014a).
The Wolaita hold a great repository of Enset landrace diversity in
their home gardens. The
Wolaita agricultural systems maintain a greater level of Enset
intra-specific diversity than any
other crop species. It is maintained in homegarden (darkuwa) ring
in poly-varietal perennial
plantations without any crop-rotations and land-fallowing.
A study done by (Tsegaye & Struik, 2001b) indicated that there
are 55 morphologically diverse
Enset clones known by Wolaita People. However; two years later
review done by (S. Dahikar
& Rode, 2014), at the then Areka Research Station showed that
there were 77 Enset accessions
in Wolaita administrative regions(Tsegaye & Struik,
2001a).
After eight years later, the same study done by the Areka
Agriculutural Reseach Center
(AARC), 2012 indicated that from the overall landraces that are
known to the Wolaita farming
communities only 35 are represented in the national ex situ Enset
collection of AARC. This
showed that 42 landraces of Enset is either genetically eroded or
not recorded very well.
Different researchers result indicated that there is a decreasing
trend in maintaining landrace
Enset diversity in Wolaita. Some of the landrace genotypes have
been rare; many more are not
cultivated anymore.
Recent research result by (Olango, Tesfaye, Catellani, & Pè,
2014) indicated that 67 different
vernacular names of Enset landrace were under cultivation. From
these 31 landraces in lowland
and 52 landraces in each of the highland and midland agroecologies,
22 of which were shared
across the 3 agro-ecologies. In general, many landraces are
identified by vernacular names,
showed a narrow and unique pattern of distribution, whereas 39
(41%) landraces known to the
Wolaita community were commonly reported at least by 3 of the 5
kebeles(Haile, 2014a).
Different previous studies showed that the genetic diversity of
Enset was decreasing from time
to time. This may be due to farmers give priority for some selected
clones; genetic erosion or
limited researcher’s sample size. Generally different researchers
result combined together
identified and named a total of 95 Enset landrace vernacular names
known to the Wolaita
farming communities (Haile, 2014a) and (Olango et al., 2014).
19
2.8. Health Benefits of Enset
Some Enset varieties are believed to have medicinal value and used
by the Enset growing
community. For example, in Areka area, a variety called sweete is
strongly recommended for
treating a person with bone problem. This may be because it
contains high calcium and
phosphorous. Even in the central highlands and cities where Enset
is not a staple, bulla is fed
to a mother who gave birth for strengthening and fast
recovery.
They also make atmit (gruel) and given to a person caught cold.
Different Enset varieties were
reported to have medicinal and religious (ritual) significances for
prevention, healing, and other
therapeutic purposes(Bekele, Diro, Yeshitla, Agricultural, &
Agricultural, 2013).
2.9. Approaches to this Study/Related works
The study on Enset yield prediction for Kocho has been done by
(Haile, 2014b). The main
objective to develop multiple regression models which take in to
account large number of
samples, different Enset clones from low to high yielder and the
other vegetative parameters.
And to construct a more precise model which will enable to predict
kocho and fiber yields non-
destructively from linear dimensions of Enset plants. According to
(“Emergencies Unit for
Ethiopia,” 1996) based on the data from sample size of 67 Enset
plants a positive relationship
was obtained between measurements of plant pseudostem girth and
height with plant kocho
yield. Previously developed Enset yield predictor model by
(“Emergencies Unit for Ethiopia,”
1996) also lacked taking in to account different types of clones;
the model also predicts no yield
of the very small and very high plants.
Multiple Linear Regression (MLR) is the method, used to model the
linear relationship between
a dependent variable and one or more independent variables. The
dependent variable is
sometimes termed as predicant and independent variables are called
predictors(Ramesh &
Vishnu, 2015).
Multiple regression models which will enable to predict kocho yield
from linear dimensions of
Enset considering different clones developed by (Haile, 2014b).
Also, an attempt to estimate
fiber yield from measurements of the vegetative parameters, though
none of the regression
relationships gave a significant result.
20
The experiment was carried out at Areka Agricultural Research
on-station site on a total number
of 328 Enset clones from the six major Enset growing areas of
Southern Ethiopia. Plant height
and pseudostem circumference were found out to be the best
non-destructive Enset kocho yield
predictors (Haile, 2014b). Kocho assessment is a difficult task as
Enset is a multiple year crop
and transplanted from nursery to nursery and then main field at
ever wider spacing.
According to (Yesuf & Hunduma, 2012a) Attempts were also made
to develop regression
model which, non-destructively, predicts yield of Enset . It was
with better precision and
simplifying yield evaluation in experiments and solve difficulties
in estimating kocho yield in
the assessment of production balance in Enset production region of
the country.
2.10. Summary of Related Works
Research Methods and
width measurements an Enset, kocho
yield regression model done for
kocho yield prediction.
fermented unsqueezed kocho yield
scheme was implemented in order to
improve the cost prediction accuracy
of crop.
farmers, which gives the analysis of
rice production based on available
data? Different
simplifying yield evaluation in
experiments and also solve
Enset yield estimation
models accounting the
plant non-destructively
are still lacking.
The already developed Enset yield estimation model only works for
'kocho' yield estimation; models
for 'bulla', 'amicho' and fiber yield estimations are not yet
developed.
Table 2-1: Summary of related works
2.11. Gaps in Previous Study
Previously developed Enset yield predictor model by collected from
the six major Enset
growing areas of (“Emergencies Unit for Ethiopia,” 1996) also
lacked taking in to account
Enset plants with different types of clones. The model also
predicts no yield various growth
as well as yielding ability ranging from the of the very small and
very high plants.
Enset yield estimation models accounting the inter clonal, age
group, agro-ecological, and
harvesting time differences to predict the different yield products
(kocho, bulla, amicho, and
22
fiber) of an Enset plant non-d.estructively are still lacking. The
already developed Enset
yield estimation model is having a limited use in that the sample
clones were considered at
one location and only works for 'kocho' yield estimation. Models
for 'bulla', 'amicho' and fiber
yield estimations are not yet developed (Yesuf & Hunduma,
2012a).
2.12. Recommendation from previous Study
For the future, it is recommended that by using many samples from
specific Enset clone
having similar fiber content, fiber yield could be estimated from
linear dimensions of Enset
plant (Haile, 2014b). Enset yield estimation models accounting the
inter clonal, age group,
agro-ecological, and harvesting time differences should be
developed to predict the different
yield products (kocho, bulla, amicho, and fiber) of an Enset plant
non-destructively (Yesuf
& Hunduma, 2012a).
The ENN method is based on BPNs. The ENN mechanism randomly
generates a plurality of
neural networks, each with a different architecture. For instance,
the numbers of hidden layers
and hidden layer neurons are generated randomly(Kung et al., n.d.).
Added by (Shahhosseini,
Hu, & Archontoulis, 2020), Stacked generalization aims to
minimize the generalization error
of some ML models by performing at least one more level of learning
task using the outputs
of ML base models as inputs and the actual response values of some
part of the data set
(training data) as outputs.
3.1. Research Methodology
For this research work, we have followed a Design science research
methodology which is a
type of information technology research methodology that focuses on
evaluating the
23
performance of the outcome. It is a research paradigm where the
creation of new artifact and
evaluation of the artifact is a key contribution.
In this research we have used process model designed by (Peffers,
et al.,2007), (Offermann,
Levina, & Schönherr, 2009), and (Rossi, Hui, & Bragge,
2006) which has six phases. These are
problem identification and motivation, defining the objective for
solution, designing and
development, demonstration, evaluation and communication. In order
to achieve the objective
of our research and answer the research questions formulated in the
statement of the problem
section, this research methodology is used in this study.
3.1.1. Problem identification and motivation
In this phase, the research problem is identified and motivation
coming from the identified
problems defined. Problem definition is used to develop an artifact
that provides a solution.
Literatures are reviewed to acquire knowledge about the state of
the problem and the importance
of the solution. Literatures which support our research work are
reviewed, and the gaps in
related research works are analyzed and how we fill in the gaps is
presented. We have also
reviewed several previous related work journals, articles, books,
and materials. Relevant
documentation about tools and techniques for the model design and
development have been
reviewed and analyzed.
3.1.2. Objectives for the solution
The objectives of a solution are inferred from the problem
definition or specification. Many
literatures have been reviewed to know the state of the problem,
the state of current solutions
and state of the art. The objective of the research is to solve the
problem mentioned by
developing Enset yields prediction model.
3.1.3. Designing and Development
In this section, the solution design is created and developed. This
activity includes
determining the artifacts of desired functionality and its
architecture and then developing
the actual model. Keras (using TensorFlow as backend) is used for
designing the RF, MLP-
24
ANN and the Stacked Ensemble of MLP-ANN, Python is used for writing
the required source
codes.
3.1.4. Demonstration
The developed system is demonstrated by simulating how the
developed model to predict Enset
yields (Kocho in tow forms, Bulla, Amicho, and Fiber-the byproduct)
from the historical data
of Enset . We have used Anaconda tool, Spyder editor with Python
language to develop the
model.
3.1.5. Evaluation
The developed system is evaluated to measure how well it supports a
solution to the
problem. To evaluate the system in a rational method, testing
datasets were fed into the
developed model. Subsequently, the model was evaluated by comparing
its output with
the observed data using R2, MSE, and RMSE.
3.1.6. Communication
The researchers will be communicating the AARC for the further
implementation of the
proposed model. It would be also considered to communicate with the
local agriculture sector
to finance for its improvement to implement for the use of farmers.
Other scholars from the
areas of agriculture and engineering will be communicated to see
for the ways improve the
model and gadget the system. Zonal administration agriculture
office also will be the target
sector to discuss the ways of using the model for the use of
sectors and farmers. The researchers
also will communicate entrepreneurs, investors and NGOs working in
the areas of improving
yields of crop for the betterment of the farmers.
3.2. Rationale of the Research
This study is conducted by quantitative research approach. In
quantitative research approach,
collecting and analyzing of the data obtained from different
sources is in a structured way and
it involves in the use of computational, statistical, and
mathematical tools to derive results. This
quantitative research approach was used as the study began with
data collection based on
document analysis, physical observations of the study site and the
important stakeholders
25
helped in providing the necessary data for the study. In addition
to this, the secondary data was
extracted by having literature review.
From Areka Agricultural research center, the input and output data
needed for this study was
collected. The data contained about 11 input parameters, these
are,
Maturity time
Plant height
Pseudostem height
Pseudostem circumference
Leaf number
Leaf length
Leaf width
and about 5 output parameters are
Corm weight before grating (Amicho)
Bulla Weight (Bulla)
Fiber Weight (Fiber)
Fermented unsqueezed kocho (Kocho-Unsqueezed)
Fermented squeezed kocho (Kocho-Squeezed)
These are used for the ANN yield prediction model processing. As
talked to experts and
researchers on the area of study in addition to literature review.
These parameters considered
for the research are those that are most delicate to the outputs.
These parameters are used as
inputs and outputs to and from our newly developed Yield prediction
model. The model
generated from ANN is applied collected vegetative and agronomical
parameters of the Enset
to predict the yields. That are, fermented unsqueezed KOCHO,
Fermented squeezed KOCHO,
Bulla, Amicho/Corm and Fiber-the byproduct.
26
3.3. Data Analysis
3.3.1. Data Collection
We have collected both primary and secondary data from Wolaita Zone
from Areka Agricultural
Research Ceneter/AARC/. As stated by(Beyene Teklu Mellisse,
Descheemaeker, Giller, Abebe,
& van de Ven, 2018), AARC is located 70 09’ N latitude and 370
47’ E longitudes and at an
elevation range of 1750 and 1820 masl. Based on five years data,
the average annual rainfall is
1615.2 mm with a minimum/maximum mean air temperature of
13.90C/25.60C and 63%
relative humidity. The soil is silt clay loam with a pH value of
5.2. The research center has been
serving the area for about 25 years. Also, the center has been
caring the Enset research at national level
whose center of excellence is Enset (Enset Ventricusom).
After having reviewed many documents and talked to experts in the
area of the study, giving field
observations, and as stated by (Yemataw, Chala, Ambachew, Grant,
& Tesfaye, 2017), we have
identified 15 quantitative parameters used for Enset Yield
Prediction. Enset has five (5) yields: KOCHO
in two forms (Kocho-Unsqueezed and Kocho-Squeezed), AMICHO/CORM,
BULLA and FIBER-
the byproduct. As stated by (Yemataw et al., 2017) and added by
(Haile, 2014b) and (B. T. Mellisse,
Descheemaeker, Mourik, & van de Ven, 2017) the Enset yield
predictions has significance to vegetative
parameters/ agronomic characteristics that were said 15
quantitative traits. In this study, in our case we
have introduced one parameter which is the weight of central shoot
after the inflorescence removed
measured before grating. Because it is correlative relationship for
one of the Enset yield Bulla
prediction. This Bulla prediction did not attempt in the precious
study. Therefore, 16
quantitative parameters were used for this study.
We have used physical site observation in the Areka Agricultural
Research Center (AARC),
which is the center of excellence for research on Enset nationally
for primary data. And the
differentiated 16 quantitative parameters were recorded from 36
different selected clones of
Enset which are very suitable for the particular area in AARC. The
36 clones were selected for
the sample collection based on the most availability in the study
area. For the 36 clones, the
quantitative measurement process was held.
3.3.2. Data Description
27
The data for this research was collected from Areka Agricultural
research center (AARC),
whose center of the excellence at national level is Enset (Enset e
Ventricosum). As mentioned
in table 1 bellow, there are 16 quantitative parameters of Enset
(Enset e Ventricosum) data for
this research. These quantitative parameters were selected after
having talked to many experts
and researchers like Dr Zerihun Yemataw, Dr Yasin Goa and technical
teams in the area of the
study.
In this process, the procedure has started from the differentiation
of the quantitative parameters
which are used in the entire research. As deep discussion with the
research center, the prediction
of the yield in case of Enset (Enset e Ventricosum) agreed to have
the following quantitative
parameters.
No Quantitative trait Code Description
1 Maturity time MT Number of years from transplanting up to
harvesting.
2 Plant height (m) PLHT Plant height was measured prior to
harvesting
by
Measuring it from ground level to the tip of the
longest leaf using a tape meter.
3 Pseudostem height
harvesting by measuring it from ground level to
the start of the leaf petiole using a tape meter.
4 Pseudostem
circumference (m)
to
the middle height point using a tape meter.
5 Leaf number LFNO Leaf number was taken prior to harvesting
by
counting all the fully expanded and green
leaves.
6 Leaf length (m) LFL Leaf length was taken prior to harvesting
by
measuring from the end point of the petiole to
28
Table 3-1: Quantitative parameters of the Enset data
the tip edge of the leaf using a tape meter across
the midrib.
7 Leaf width (m) LFWTH The leaf width was measured prior to
harvesting by measuring at the middle wider
point using a tape meter.
8 Leaf sheath number LFSTH
NO
from each plant at harvest.
9 Leaf sheath weight
before decortication and measured before
decortication
and measured after decortication.
11 Central shoot weight
inflorescence removed measured before grating.
12 Corm weight before
removal and prior to grating.
13 Bulla Weight (kg) BWT The weight of dehydrated mixture of
scrapped
parenchymatic tissue of leaf sheaths pulverized
corm and granted stalk of inflorescence.
14 Fiber Weight (kg) FYield Fiber yield was measured by weighing
all the
fiber left, soon after decorticating the leafsheath.
15 Fermented
unsqueezed kocho
yield (kg/plant)
The unfermented kocho yield is left in the pit for
some time usually 30 days for fermentation.
16 Fermented squeezed
kocho yield (kg/plant)
applying human force to reduce its water as
much as possible.
3.3.3. Independent and Dependent Variables
Input data, the independent variables and the output data, the
dependent variables needed for
this research are taken from Areka Agricultural Research center
whose center for excellence is
Enset at National Level. 11 inputs and five output parameters used
for the ANN processing.
The inputs considered for the research are those that are most
sensitive to the outputs(Yemataw,
Mohamed, Diro, Addis, & Blomme, 2014). These 11 input
parameters are listed below in table
2:
No Quantitative trait Code Description
1 Maturity time MT Number of years from transplanting up to
harvesting.
2 Plant height (m) PLHT Measurement from ground level to the tip of
the
longest leaf at flowering.
3 Pseudostem height (m) PSHT Measurement from ground level to the
start of
the petioles.
4 Pseudostem
circumference (m)
pseudostem.
5 Leaf number LFNO The number of 50% green and 50% unrolled
leaves.
6 Leaf length (m) LFL Measurement of all functional leaves from
the
end of the petiole to the tip of the leaf and their
mean was taken for analysis.
7 Leaf width (m) LFWTH Measurement of the widest part of all
functional leaf blades just below flag leaf and
their mean was taken for analysis.
8 Leaf sheath number LFSTH
NO
from each plant at harvest.
9 Leaf sheath weight
before decortication and measured before
decortication
30
Table 3-2: Input Parameters of the Enset
The output parameters are five, which are listed in table 3, these
are Corm Weight, Bulla
Weight, Fiber Weight, fermented unsqueezed Kocho yield and
Fermented squeezed kocho
yield (Yemataw et al., 2014).
Table 3-3: Output Parameters of the Enset
3.4. Data Preprocessing
Data Preprocessing is a technique that is used to convert the raw
data into a clean data set. In
our research the data collected from AARC is preprocessed so that
it can be suitable for good
model design in artificial neural network. For the currently
collected Real-world data it is often
clear that the data is incomplete, inconsistent, and/or lacking in
certain behaviors or trends, and
is likely to contain many errors(García, Ramírez-gallego, Luengo,
Benítez, & Herrera, 2016).
10 Leaf sheath weight
decortication and measured after decortication.
11 Central shoot weight
inflorescence removed measured before
1 Corm Yield CORM
root removal and prior to grating.
2 Fiber Yield FY The weight of fiber measured
3 Bulla Yield BWT The weight of bulla measured
4 Fermented unsqueezed
fermentation.
SQKOC
HO
applying human force to reduce its water as
much as possible.
31
It is data preprocessing methods proven of resolving such issues.
As mentioned by (Alasadi,
2017) it follows the following steps. As mentioned by (Kung et al.,
2016.) the data
preprocessing has three stage, which involves data integration,
data cleaning and data
transformation.
Figure 3-1: Data Preprocessing Stages(Kung et al., 2016.)
3.4.2. Data integration
We have collected raw data and stored in a place in which data
cleaning can be performed. 2520
row of data with 16 columns are collected and stored. We have
consider the differences of the
clones for Enset having taken number of years from transplanting up
to harvesting as maturity
time. The measurement done for the Enset from ground level to the
tip of the longest leaf at
flowering to get its heights which is very significance when
designing prediction model for
Kocho and Fiber. Psedustem height is also an important factor
determining the yields of the
Enset. We have measured it from ground level to the start of the
petioles in meter.
The measurement at the middle height of the Enset pseudostem, we
call it pseudostem
circumference which is also another determinant of Kocho yields.
Leaf length has also the value
for making the yields of the Enset determined by. The Measurement
of all functional leaves
from the end of the petiole to the tip of the leaf and their mean
was taken for analysis. To get
the leaf width in meter, we did a measurement of the widest part of
all functional leaf blades
just below flag leaf and their mean was taken for analysis to get
leaf width in meter. Leaf sheath
32
number is also an important factor. We have counted of all
decorticable leaf sheathes obtained
from each plant at harvest. Leaf sheath weight before decortication
is another factor affecting
the Enset yields. The weight of all leaf sheathes for each plant
before decortication and
measured before decortication in kilogram.
Leaf sheath weight after decortication in kilogram is another
factor. The weight of pulp for each
plant after decortication and measured after decortication. Central
shoot weight in kilogram
before grating the main determinant of Bulla yield. The weight of
central shoot after the
inflorescence removed measured before grating.
3.4.3. Data cleaning
In data preprocessing, this step is using to fill in missing values
(attribute or class value). In our
research case, the data collected are summarized one. No missing
data in our research work
happen.
3.4.4. Data transformation and Standardization:
The mean and standard deviation estimates of a dataset can be more
robust to new data than the
minimum and maximum. Data standardization is about making sure that
data is internally
consistent; that is, each data type has the same content and
format. Once the standardization is
done, all the features will have a mean of zero, a standard
deviation of one, and thus, the same
scale.
Standardizing a dataset involves rescaling the distribution of
values so that the mean of observed
values is 0 and the standard deviation is 1. Subtracting the mean
from the data is
called centering, whereas dividing by the standard deviation is
called scaling. As such, the
method is sometime called “center scaling“.
The most straightforward and common data transformation is to
center scale the predictor
variables. To center a predictor variable, the average predictor
value is subtracted from all the
values. As a result of centering, the predictor has a zero mean.
Similarly, to scale the data, each
value of the predictor variable is divided by its standard
deviation. Scaling the data coerce the
values to have a common standard deviation of one.
33
This technique in machine learning is feature scaling, which is
very important to design neural
network model. There are two approaches of data transformation.
Normalization and
Standardization. We used Standardization approach that scales
features such that the
distribution is centered around 0, with a standard deviation of 1.
StandardScaler follows
Standard Normal Distribution (SND).
Therefore, it makes mean = 0 and scales the data to unit variance.
Its main purpose is to change
the values of numeric columns in the dataset to a common scale,
without distorting differences
in the ranges of values(García et al., 2016). It is applied to
independent and variables of the
data. Sometimes, it also helps in speeding up the calculations in
an algorithm(Zhu, 2016).
Standardization = X−µ
Standard Deviation = √
(3,3)
In this step of model development, the collected data is loaded to
the python Programming
environment and we split in to two datasets. Training Dataset and
Testing Dataset. Among
2520 records of 7 years data for 36 clones of Enset s, 70% of the
data splitted in to training
dataset and 30% of the data splitted in to testing dataset. Then
the data is classified as Input
Dataset, we call independent variables and output dataset we call
them output variables.
We also classified the dataset in to training dataset and testing
dataset. As we have 2520 records
of total dataset, 70% of the total dataset is assigned as training
dataset and 30% of the total
dataset for testing case. This is done in our research by loading
the Prepared Enset data in csv
format in Python Workspace using Syder Editor in Pyhthon 3.7 using
anaconda.
3.5. Tabular Visualization of Data
In this section we have discussed about the analysis of data of 7
years from
2005 up to April 2011 E.C. We have summarized the data using table
and graph. After
34
preprocessing the Enset records, we have 2013 to 2019. We have
recorded 2520 records with
16 features, from which 11 parameters are input and 5 parameters
output (targets).
Figure 3-2: Tabular representation of Dataset
Figure 3-3: Visualized Input dataset
35
3.6. Prediction Model Skeleton
Figure 3-5: Skeleton for Yield Prediction Model for Enset
From fig 3.5, the skeleton of the prediction model is classified in
to five working zones. These
are listed as follows.
In data zone of the model skeleton, all activities like data
collections, data analysis, and data
preprocessing are discussed. In the training/Learning Zone, after
the activities related to data
are done, the model has to be trained using different
training/learning machine learning
36
algorithms. In evaluation Zone, after the model is trained, it
needs to be evaluating for its
performance. This is done in this zone of model design. Prediction
Zone, the model after its
evaluation for performance, it has to be ready for prediction of
Enset yields which is the goal
of this research. Finally, in final/Output Zone, the outputs of the
predictions, which are the
yields of Enset (Kocho in two forms, Bulla, Amicho and Fiber, the
byproduct).
3.7. Proposed Enset Yield Prediction Model
In this section we describe about the developed model for Enset
yields prediction. MLPs are
capable of approximating any continuous function. Multilayer
perceptron are often applied to
supervised learning problems: they train on a set of input-output
pairs and learn to model the
correlation (or dependencies) between those inputs and
outputs.
In our study, Ensemble Multilayer Perceptron Neural Network Model
(EMLP-NN) with two
hidden layers. Stacked Generalization Ensemble is a model averaging
techniques that combines
the predictions from multiple trained models.
Stacked generalization aims to minimize the generalization error of
some ML models by
performing at least one more level of learning task using the
outputs of ML base models as
inputs and the actual response values of some part of the data set
(training data) as
outputs(Shahhosseini et al., 2020). Researches like (Kung et al.,
n.d.), shows Ensemble Neural
Network (ENN) method is better than traditional back-propagation
neural networks and
multiple regression analysis.
Researcher (Kim & Seo, 2018) suggests backpropagation algorithm
as the most common and
standard training algorithm, the central idea of which is that the
errors for the neurons of the
hidden layer are determined by back-propagation of the error of the
neurons of the output layer.
We optimized backpropagation algorithm by ADAM optimizer after we
found it better than
others optimizers like SGD and RMSprop.
3.8. Ensemble Multilayer Perceptron Neural Network (EMPL-NN)
A three-layer, defined by an input layer, a hidden layer and an
output layer feed forward back-
propagation neural network is developed. This three-layer neural
network is used to predict
Enset Yields particularly KOCHO in two forms (Unsqueezed KOHCHO and
Squeezed
37
KOCHO), BULLA, AMICHO/CORM AND FIBEER-the byproduct due to its
ability to
accommodate large input data and its capabilities to solve problems
with vast complexities.
3.8.1. Input Layer
This is the first layer in the neural network. It takes input
signals (values) and passes them on to
the next layer. It doesn’t apply any operations on the input
signals (values) & has no weights and
biases values associated. In our research the network model has 11
input signals MT, PLHT,
PSHT, PSCIR, LFNO, LFL, LFWTH, LFSTH NO, LFSTH BD, LFSTH AD,
SBG
3.8.2. Hidden Layers
In neural networks, a hidden layer is located between the input and
output of the algorithm, in
which the function applies weights to the inputs and directs them
through an
activation function as the output. Hidden layers have neurons
(nodes) which apply different
transformations to the input data.
In our research we have 2 hidden layers, 11 neurons for the first
hidden layer and 5 neurons in
the second hidden layer which is passing on the values to the
output layer. We have compared
different structures to select for hidden layers and finally we
found the chosen hidden layers and
neurons gave better results. All the neurons in a hidden layer are
connected to each neuron in
output layer. In short, the hidden layers perform nonlinear
transformations of the inputs entered
into the network.
3.8.3. Output Layer
This layer is the last layer in the network & receives input
from the hidden layer. With this layer
we can get desired number of values. In this research the network
model has 5 neurons in the
output layer, and it has also the outputs Corm (Amicho), Fiber the
byproduct, Bulla, Kocho in
two forms (Fermented squeezed Kocho and Fermented unsqueezed
Kocho).
38
Figure 3-7: Proposed Ensemble MLP-ANN Model
The above diagram figure 3.7 shows the proposed Ensemble ANN model.
We have designed a
proposed model which is EMLP-ANN using a techniques called
Stacking. Stacking is a model
averaging technique where multiple sub-models contribute equally to
a combined prediction.
The technique of combining multiple models into a single one is
referred to as Ensemble
Modeling(Kim & Seo, 2018), And as added by (Kim & Seo,
2018), the application of an
ensemble technique is divided into two steps. The first step is the
creation of individual
ensemble members, and the second step is the combination of outputs
of the ensemble members,
to produce the most appropriate output.
Random forests are an ensemble learning method for regression that
operate by constructing a
multitude of decision trees at training time and outputting the
mean/average prediction. It is
usually trained with the “bagging” method. The general idea of the
bagging method is that a
combination of learning models increases the overall result. In our
study we have used Random
Forest which is an ensemble of decision trees for the combination
of the MLP-ANNs.
Instead of searching for the most important feature while splitting
a node, it searches for the
best feature among a random subset of features. This results in a
wide diversity that generally
results in a better model.
40
In our proposed model EMLP-ANN, it contains two steps. The first
step is training the sub
models of MLP ANN using training dataset to make predictions from
this data set. These sub
models or base models are trained by using an algorithm called
Backpropagation.
The second step is training the single model or Meta learner. This
meta-learner or single model
takes the outputs from base models as inputs, and learns to make
predictions from this data. In
our study, Random forest algorithm is used as meta-learner that
will best combine the
predictions from the sub-models. The outputs of the sub models will
be merged using simple
concatenation merge. Then the average of the result will go to
output layer. Then finally the
bigger Ensemble Multilayer Perceptron Artificial Neural Network is
created to predict Enset
Yields (KOCHO in tow forms, BULLA, AMICHO/CORM, FIBER the
byproduct).
3.9. Back propagation algorithm
Backpropagation is the most used algorithm for training artificial
neural networks. This
algorithm is based on an optimization method called gradient
descent(Jukic et al., 2020).
Artificial neural networks use backpropagation as a learning
algorithm to compute a gradient
descent with respect to weights. Much recommended for predictive
analysis. Desired outputs
are compared to achieved system outputs, and then the systems are
tuned by adjusting
connection weights to narrow the difference between the two as much
as possible.
The algorithm gets its name because the weights are updated
backwards, from output towards
input. Because backpropagation requires a known, desired output for
each input value in order
to calculate the loss function gradient, it is usually classified
as a type of supervised machine
learning.
In a nutshell, the algorithm has three phases(Jukic et al., 2020):
forward propagation, error
calculation, and weights’ updates. When the input data sample is
propagated through the
artificial neural network, then the output is calculated. The
output of the input sample is
compared with expected output and the error is calculated. The
error is used to do backward
propagation and update weights in all layers to make the error
minimal. This process has been
repeated for every data sample from input. This process is repeated
until artificial neural
network mean square error reaches the desirable level.
41
The most common and standard algorithm is the backpropagation
training algorithm, the central
idea of which is that the errors for the neurons of the hidden
layer are determined by back-
propagation of the error of the neurons of the output layer(Kim
& Seo, 2018). As mentioned by
(Ranjeet & Armstrong, 2014) the process involves in the
backpropagation algorithms is shown
in the following steps:
Step 1: Randomly initiate the weights to small numbers close to 0
but not 0
Step 2: Provide the input data sets in the input layer and desired
outcomes
Step 3: Forward Propagation: from left to right the neurons are
activated in a way that
the impact of each neuron.
Step 4: Compute the error between the actual and desired
outcomes
Step 4: Back propagate: from right to left, Amendment of the
weights associated with
inputs and functions.
Step 5: Compare the error and the tolerance ratio
Step 6: If error is still higher than the tolerance, begin from the
step 1 again otherwise
stop. When the whole training set passed though the ANN, that makes
an epoch.
First term, in Backpropagation Algorithm is “feed forward” defines
how this neural network
works and recalls patterns and the term “back propagation” defines
how this kind of neural
networks are accomplished. The network obtains inputs by neurons in
the input layer, and the
output of the network is given by the neurons on an output
layer(Šastný, Konený, & Trenz,
2011).
Next, the network calculates a loss function to estimate the loss
(or error) and to compare and
measure how good/bad our prediction result.
Once the loss has been calculated, this information is propagated
backwards. after forward
propagation has competed, we get an output value which is the
predicted value. Starting from
the output layer, that loss information propagates to all the
neurons in the hidden layer that
contribute directly to the output. To calculate error, we compare
the predicted value with
the actual output value. We use a loss function to calculate the
error value. Then we calculate the
derivative of the error value with respect to each weight in the
neural network.
42
Loss Function/Cost Function — the loss function computes the error
for a single training. The
cost function is the average of the loss functions of the entire
training set.
Figure 3-8: Learning Process Artificial Neural Network
3.10. Process flow diagram
Figure 3-9: Process flow diagram
Fig 16 describes the Enset yield prediction model. The model has
four core phases. These are,
Data preprocessing,
Prediction
In data preproces