Upload
iaeme
View
70
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Leaf identification based on fuzzy c means and naïve bayesian classification
Citation preview
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 7, July (2014), pp. 71-82 © IAEME
71
LEAF IDENTIFICATION BASED ON FUZZY C MEANS AND NAÏVE
BAYESIAN CLASSIFICATION
Shilpa Ankalaki1, Laxmidevi Noolvi2, Dr. Jharna Majumdar3
1Department of CSE (PG), NMIT Bangalore -560064, India
2Department of CSE, Assistant Professor, NMIT Bangalore-560064, India 3Dean R&D, Prof and Head CSE (PG), NMIT, Bangalore, 560064, India
ABSTRACT
Recognition of plants has become an active area of research as most of the plant species are
at the risk of extinction. This paper uses efficient features including moment invariants features are
extracted during the feature extraction phase. The proposed system proposes Fuzzy C means
clustering method for clustering the similar images and Naïve Bayesian classification to classify the
leaf image into the one of the cluster. Different distance methods can are be used to identify the
closest match of the leaf. In proposed system Euclidian distance is used to search similar leaf in the
cluster.
Keywords: Euclidian distance, Fuzzy C Means Clustering and Naive Bayesian Classification.
1. INTRODUCTION
The Plant is one of the most important forms of life on earth. Plants maintain the balance of
oxygen and carbon dioxide of earth’s atmosphere [13]. The relations between plants and human
beings are also very close. In addition, plants are important means of livelihood and production of
human beings. . Plants are vitally important for environmental protection. However, it is an
important and difficult task to recognize plant species on earth. Many of them carry significant
information for the development of human society. The urgent situation is that many plants are at the
risk of extinction [10]. So it is very necessary to set up a database for plant protection [3-4]. The
proposed method mainly concentrates on leaf shape features regardless of color features, because the
color of the leaf may change due to the climate change or due to the some disease so color feature are
inefficient.
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND TECHNOLOGY (IJARET)
ISSN 0976 - 6480 (Print) ISSN 0976 - 6499 (Online) Volume 5, Issue 7, July (2014), pp. 71-82 © IAEME: http://www.iaeme.com/IJARET.asp Journal Impact Factor (2014): 7.8273 (Calculated by GISI) www.jifactor.com
IJARET
© I A E M E
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 7, July (2014), pp. 71-82 © IAEME
72
2. PROPOSED METHODOLOGY
The proposed methodology mainly consists of 2 phases.
i. Learning Phase
ii. Identification phase
Fig1 shows the flow diagram of the learning phase and identification phase.
2.1 Image Acquisition Leaves are usually clustered so that it is difficult to automatically extract features of one leaf from
the unneeded background. We created leaf Image plates. Put these leaves on the light panel, and then
take the picture of the leaf with a digital camera. In this way, we can get an image including only one
leaf.
2.2 Image Pre-processing The raw data, depending on the data acquisition type is subjected to a number of pre processing steps
to make it usable in the descriptive stages of analysis. Pre processing aims to produce image data that
are easy for the Leaf Identification system and can operate quickly and accurately.
2.2.1 Conversion of Color image to Gray scale image
The colors of plant leaves are usually green. Moreover, the shades and the variety of changes
of water, nutrient, atmosphere and season can cause change of the color, so the color feature has low
reliability. Thus, we decided to recognize various plants by the grey-level image of plant leaf. Fig 2
shows the pre processing of leaf image.
An RGB image is firstly converted into a grayscale image. Eq. (1) is the formula used to
convert RGB value of a pixel into its grayscale value.
Gray = 0.2989 * R + 0.5870 * G + 0.1140 *B (1)
Where R, G, B correspond to the color of the pixel, respectively
Fig.1: Flow diagram of learning and identification phase
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 7, July (2014), pp. 71-82 © IAEME
73
2.2.2 Generation of Binary Image
In the analysis of images, it is essential to separate the objects of interest from the background. The
techniques used to find the objects of interest from the rest are referred to as Thresholding techniques
and the cluster of pixels corresponding to region of interest are known as foreground pixels and the
rest of the pixels are known as background pixels. The image data is converted to a two level binary
image having pixel values between 0 and 255 this is done using Thresholding, where all pixels above
certain level are assigned 255 and rest of the pixels 0. The proposed methodology used statistical
mean method and Otsu’s method for automatic Thresholding.
2.2.3 Extraction of Boundary
Boundary extraction can be applied to any image containing only boundary information. Once a
single boundary point is found, the operation seeks to find all other pixels on that boundary.
Boundary can be extracted using chain code technique. The system defines the boundary of the leaf
in terms of x-y coordinates [11]. From a starting point, the system traces the boundary coordinates in
a clockwise direction.
Fig.2: Pre Processing of leaf image
2.3 Feature Extraction Feature extraction involves the extraction of geometric features which represents the shape of
the leaf. These features these features are used by the classifier to classify the leaf image. Different
types of features extraction are discussed below.
2.3.1 Aspect ratio: The aspect ratio [1] is ratio between the maximum length and the minimum
length of the minimum bounding rectangle or ratio between length and width of the minimum
bounding box of leaf image. It is scale invariant feature.
Aspect Ratio � LengthWidth �2�
2.3.2 Rectangularity: Rectangularity is the measure of how closely the shape of leaf approaches to
rectangle or it can be defined as the similarity between leaf and rectangle. To calculate the
rectangularity first step is to create bounding box to the leaf image, and find the ratio of leaf area to
the area of leaf bounding box.
Rectangularity Leafarea�Length � Width� �3�
2.3.3 Perimeter: The total number pixels on the leaf boundary.
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 7, July (2014), pp. 71-82 © IAEME
74
2.3.4 Roundness: Roundness [2][8] is the measure of how closely the shape of leaf approaches that
of a circle. Difference between a leaf and a circle is calculated by using the Eq. 4
Roundness � 4π � AreaPerimeter �4�
2.3.5 Sphericity: Spericity [4][5] is the ratio of the radius of the incircle of the leaf object (ri) and the
radius of the excircle of the leaf object (rc). Incircle and excircle are as shown in the Fig 3(a).
2.3.6 Principal axes: Principal axes of a given shape is uniquely defined as the two segments of lines
that cross each other orthogonally in the centroid of the shape and represent the directions with zero
cross correlation. This way, a contour is seen as an instance from a statistical distribution. Fig 3 (b)
shows the principal axes.
2.3.7 Eccentricity: Eccentricity is defined as the ratio of minor principal axes to major principal axes.
2.3.8 Tooth Feature: A tooth point [14] is a pixel on the contour that has a high curvature, i.e., it is a
peak. To determine whether a point Pi on the contour is a tooth point or not, we examine the angle
subtended at Pi by its neighbors Pi-k and Pi+k (where k is a threshold). Fig 3(c) shows an example. If
the angle θ is within a particular range, then Pi is a tooth; otherwise, it is not. It is also possible for
two different types of leaves to have nearly the same number of teeth at a particular threshold [12];
so we compute the tooth-based features at multiple increasing threshold values.
Fig.3: (a) Incircle and Excircle, (b) Principal axes, (c) Tooth detection at two different thresholds,
(d)Black occupancy (e) Convex hull
2.3.9 Black occupancy: Black occupancy gives the number of boxes that are occupied by the leaf
pixels. It is a scale invariant feature. Fig 3(d) shows the black occupancy. The Input leaf image
divided into equal 36 boxes (6X6 matrix), and count the number of boxes that are occupied by the
leaf pixel.
2.3.10 Convex Hull: Convex hull of a set of points S is the boundary of the smallest convex region
that contains all the points of S inside it or on its boundary. There are a number of applications of the
convex hull problem, including partitioning problems, shape testing problems, and separation
problems. Fig 3(e) shows the convex hull. The algorithm for convex hull is given in the Appendix A.
Using convex hull we can derive two more features based on the shape of leaf image and its
convex hull.
• Convexity
Convexity is defined as ratio between the convex hull perimeter of the leaf and the perimeter
of the leaf. Mathematically, it is notated as
Convexity = Convex Perimeter / Leaf Perimeter
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 7, July (2014), pp. 71-82 © IAEME
75
• Solidity
Solidity is defined as ratio between the area of the leaf and the area of its convex hull
Solidity = Area of Convex hull / Area of Leaf
2.3.11 Leaf Vein Extraction: Leaf vein extraction [3][6] , is one of the important features of the leaf.
Leaf veins can be extracted using the grayscale morphological operations. We perform
morphological top-hat transformation on grayscale image with disk shape structuring element of
radius 2 and 3. The result looks like the leaf vein. The areas of the leaf vein are denoted as Av1 and
Av2. We obtain the leaf vein features by performing Av1/ A, Av2/ A where A is the leaf area, Av1 and
Av2 are total number of pixels on leaf veins using disk shape structuring element of radius 2 and 3
respectively. Algorithm 2 in Appendix A describes the algorithm of the leaf vein extraction. Fig. 6
shows the leaf vein extraction using disk shape structuring element of radius 2.
Fig. 6: Leaf vein extraction
2.4 Normalization of Database Features Data normalization is a useful step often adopted, prior to designing a classifier, as a
precaution when the feature values vary in different dynamic ranges [8]. In the absence of
normalization, features with large values have a stronger influence on the cost function in designing
the classifier. By normalizing data, value of all features will be in predetermined ranges.
Normalization can be done by using formula as follow.
! � "# $ "%#&"%'( $ "%#&
X represents new value of the feature, xi represents original value of the feature, xmin is the
smallest value of original feature, and xmax is the smallest value of original feature.
3. LEARNING PHASE
During learning phase, first step is to acquisition of leaf image from image database, then
Pre-processing of input image, extract geometric features from input image and extraction features
are added to the features database. Same way geometric features are extracted for all the Images
which are present in the image database and features are added to the feature database. Feature
normalization method is applied to the feature database to normalize all features within
predetermined ranges. The proposed system introduced Fuzzy C Means clustering method to cluster
the similar leaves based on the normalized features database.
3.1 Fuzzy C Means Clustering Data clustering is the process of dividing data elements into classes or clusters so that items
in the same class are as similar as possible, and items in different classes are as dissimilar as
possible. Depending on the nature of the data and the purpose for which clustering is being used,
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 7, July (2014), pp. 71-82 © IAEME
76
different measures of similarity may be used to place items into classes, where the similarity measure
controls how the clusters are formed. Some examples of measures that can be used as in clustering
include distance, connectivity, and intensity. In fuzzy clustering, data elements can belong to more
than one cluster, and associated with each element is a set of membership levels. These indicate the
strength of the association between that data element and a particular cluster. Fuzzy clustering is a
process of assigning these membership levels, and then using them to assign data elements to one or
more clusters.
Fuzzy C Means takes the database features as the input. User needs to specify the number of
clusters required. This algorithm works by assigning membership to each data point corresponding to
each cluster centre on the basis of distance between the cluster centre and the data point. More the
data is near to the cluster centre more is its membership towards the particular cluster centre. Clearly,
summation of membership of each data point should be equal to one. Algorithm 3 in Appendix A
describes the algorithm of the Fuzzy c means clustering.
4. IDENTIFICATION PHASE
During the identification, first step is to read the test leaf image. Second step is pre-
processing of given input leaf image, third step is feature extraction. Features of given leaf are
normalized and stored in the feature vector. Next step is the searching for cluster containing the input
image to be identified, for this purpose Naïve Bayesian classification is used.
4.1 Naïve Bayesian Classification Naïve Bayesian classification is supervised classification; it takes the prior knowledge from
the clusters. It is possible to use the Naïve Bayesian classification without using the clustering
method, but it is necessary to create the database such that all the similar leaves in one class. So the
proposed system introduced unsupervised classification for clustering and supervised classification
to classify the leaf image into one of the respective class. Bayesian classifiers are statistical
classifiers. They can predict class membership probabilities, such as the probability that a given tuple
belongs to a particular class. Naïve Bayesian classifiers assume that the effect of an attribute value
on a given class is independent of the values of the other attributes. This assumption is called class
conditional independence. In proposed Methodology Naïve Bayesian classification is used to find the
probability of input leaf image belongs to the each cluster. Input leaf image belongs to the cluster
which has the maximum probability. Naïve Bayesian classifier is based on the Bayes theorem is
given as follows:
)�*#|!� � )�!|*�)�*�)�!�
Where
X: feature vector of the given leaf.
C : Leaf clusters
i: Number of Clusters )�*#|!� is the probability that the cluster C holds given the observed data tuple X. )�!|*� is the posterior probability of X conditioned on C
P(X) is the prior probability of X
P(C) is the prior probability, or a priori probability, of C.
Calculation of )�*#|!�, )�!|*�, )�*� and )�!� is described in the algorithm 4 of Appendix A.
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 7, July (2014), pp. 71-82 © IAEME
77
4.2 Euclidian Distance In proposed methodology Euclidian distance measure [7] [13] is used to identify leaf image within
the cluster. To identify the leaf, find the distance between the input leaf image feature vector to all
leaves features that are present in particular cluster selected using Naïve Bayesian classifier. The leaf
image that has the minimum distance to the input leaf is identified as recognized leaf.
5. EXPERIMENTAL RESULTS
The proposed methodology uses the invariant shape features. The proposed methodology
considered 50 plant leaves for training database. The proposed leaf identification system recognizes
the leaf correctly even though it is damaged. The experiments were designed to classify each test
image into a single class. Since all the leaf images are taken by us, their true classes are known. In
our experiment, Fuzzy C Means clustering method is used to cluster the similar leaves. Clusters of
sample leaves using Fuzzy C Means is shown in the Fig.7. Naïve Bayesian classification and
Euclidian distance method is used for identification purpose. Fig. 8 shows the identification of
cluster containing the input image to be identified using Naïve Bayesian Classification.
Fig.9 shows the identification of leaf within the cluster using Euclidian distance measurement.
Fig 7: Clusters obtained using Fuzzy C Means Clustering method
Fig.8: Identification of cluster which contains input image to be identified.
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 7, July (2014), pp. 71-82 © IAEME
78
Fig.9: Identification of input image
6. CONCLUSION
The work described in this research has been concerned with the two challenging phases in
image analysis applications which are feature extraction and classification phase. Since there is no
general feature extraction method that is available for all type of images, an experiment needs to be
conducted in order to determine the suitable methods for plant leaf images. Therefore, an
investigation of some of the suitable shape features and moment invariants techniques was presented
which were used to be implemented in the feature extraction of plant leaf images. The proposed
methodology gives the 83.24% accuracy. One of the disadvantages in this research is the use of
limited sample of leaf images. The future scope of this research is the identification of compound
with different background and performance improvement of identification system.
7. ACKNOWLEDGMENT
The authors acknowledge Prof. N R Shetty, Director, Nitte Meenakshi Institute of
Technology and Dr. H C Nagaraj, Principal, Nitte Meenakshi Institute of Technology for providing
the support and infrastructure to carry out our research.
REFERENCES
1. Chia-Ling Lee and Shu-Yuan Chen, “Classification for Leaf Images”, 16th IPPR Conference
on Computer Vision, Graphics and Image Processing (CVGIP 2003)
2. Qingfeng Wu, Changle Zhou and Chaonan Wang, “Feature Extraction and Automatic
Recognition of Plant Leaf Using Artificial Neural Network”, © A. Gelbukh, S. Torres, I.
López (Eds.) Avances en Ciencias de la Computación, 2006, pp. 5-12.
3. S. G. Wu, F. S. Bao, E. Y Xu, Y-X. Wang, Y-F. Chang, & Q-L.Xiang, “A Leaf Recognition
Algorithm for Plant Classification Using Probabilistic Neural Network”, IEEE 7th
International Symposium on Signal Processing and Information Technology, Cairo, 2007.
4. J. Du, X. Wang, and G. Zhang, “Leaf shape based plant species recognition,” Applied
Mathematics and Computation, vol. 185-2, pp. 883-893, February 2007.
5. David Knight, James Painte, Matthew Potter, “Automatic Plant Leaf Classification for a
Mobile Field Guide”.
6. Xiaodong Zheng, Xiaojie Wang, “ Leaf Vein Extraction Based on Gray-scale Morphology”,
I.J. Image, Graphics and Signal Processing, 2010, 2, 25-31 Published Online December 2010
in MECS (http://www.mecs-press.org/)
7. Chomtip Pornpanomchai, Chawin Kuakiatngam ,Pitchayuk Supapattranon, and Nititat
Siriwisesokul, “Leaf and Flower Recognition System (e-Botanist)”, IACSIT International
Journal of Engineering and Technology, Vol.3, No.4, August 2011
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 7, July (2014), pp. 71-82 © IAEME
79
8. Abdul Kadir, Lukito Edi Nugroho, Adhi Susanto and Paulus Insap Santosa, “Leaf
Classification Using Shape, Color, and Texture Features”, International Journal of Computer
Trends and Technology- July to Aug Issue 2011
9. Jyotismita Chaki and Ranjan Parekh, “Plant Leaf Recognition using Shape based Features and
Neural Network classifiers”, (IJACSA) International Journal of Advanced Computer Science
and Applications, Vol. 2, No. 10, 2011
10. Prof. Meeta Kumar, Mrunali Kamble, Shubhada Pawar, Prajakta Patil, and Neha Bonde, “
Survey on Techniques for Plant Leaf Classification”, International Journal of Modern
Engineering Research (IJMER) www.ijmer.com Vol.1, Issue.2, pp-538-544 ISSN: 2249-6645
11. Chomtip Pornpanomchai, Supolgaj Rimdusit, Piyawan Tanasap and Chutpong Chaiyod, “Thai
Herb Leaf Image Recognition System (THLIRS)”, Kasetsart Journal: Natural Science May
2011 45 : 551 - 562
12. Akhil Arora, Ankit Gupta, Nitesh Bagmar, Shashwat Mishra, and Arnab Bhattacharya, “A
Plant Identification System using Shape and Morphological Features on Segmented Leaflets:
Team IITK, CLEF 2012”
13. Anant Bhardwaj, Manpreet Kaur, and Anupam Kumar, “Recognition of plants by Leaf Image
using Moment Invariant and Texture Analysis”, International Journal Of Innovation And
Applied Studies ISSN 2028-9324 Vol. 3 No. 1 May 2013, Pp. 237-248 © 2013 Innovative
Space Of Scientific Research Journals.
14. Vijay Satti and Anshul Satya, “An Automatic Leaf Recognition System For Plant
Identification Using Machine Vision Technology”, International Journal of Engineering
Science and Technology (IJEST) ISSN : 0975-5462 Vol. 5 No.04 April 2013
15. Jyotismita Chaki and Ranjan Parekh, “Designing an Automated System for Plant Leaf
Recognition”, International Journal of Advances in Engineering & Technology, Jan 2012.
©IJAET ISSN: 2231-1963.
16. Laura Keyes, Adam Winstanley, “USING MOMENT INVARIANTS FOR CLASSIFYING
SHAPES ON LARGE_SCALE MAPS”.
17. George H. John, Pat Langley, “Estimating Continuous Distributions in Bayesian
Classification”, In Proceedings of Conference on uncertainty in Artificial Intelligence, Morgan
Kaufmann Publishers, San Mateo, 1995.
18. James C. Bezdek, Robert Ehrlich and William Full,” FCM: The Fuzzy C-Means Clustering
Algorithm” Computers & Geosciences Vol. 10, No. 2-3, Pp. 191-203, 1984. Printed in the
U.S.A.
19. Garima Agarwal, Rekha Nair and Pravin Shrinath, “A Review of Plant Leaf Classification
Features and Techniques”, International Journal of Computer Engineering & Technology
(IJCET), Volume 4, Issue 5, 2013, pp. 204 - 216, ISSN Print: 0976 – 6367, ISSN Online:
0976 – 6375.
APPENDIX A Algorithm 1: Algorithm for convex hull The idea used here is to use one extreme edge as an anchor for finding the next. Suppose the
algorithm found an extreme edge whose unlinked endpoint is x
• For each y of set S compute the angle θ
• The point that yields the smallest θ must determine an extreme edge
• The output of this algorithm is all the points on the hull in boundary traversal order
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 7, July (2014), pp. 71-82 © IAEME
80
Algorithm: Convex Hull
Input: Color leaf image I (M x N)
Output: Convex hull of the given image
Steps:
Step 1: Generation of Gray scale Image
Step 2: Generation of Binary Image
Step 3: Find the lowest point (smallest y coordinate)
Step 4: Find the extreme points on every row and store in an array
Step 5: Let i0 be its index
Step 6: For every point stored in this array compute counterclockwise angle θ from index
Step 7: Find the point with the smallest θ
Step 8: Let k be the index of the point with the smallest θ
Step 9: Output (pi ,pk) as a hull edge
Step 10: i0 ← k
Step 11: repeat step 6 until i = i0
Algorithm2: Algorithm for vein extraction. Algorithm: To find Veins of leaf
Input : Grayscale Leaf Image
Output : Leaf vein structure
Step 1: Read Grayscale Image. Step 2: Let f be the Grayscale leaf image and b be the disk shape Structuring element. Structuring
element of radius 2, 3, 4 or 5 can be used.
Step 3: Perform the Erosion operation on the grayscale image using the disk shape structuring
element that is find the minimum neighbor and replace it with origin of the structuring element.
Step 4: Perform the Dilation on the output image of the erosion using the disk shape structuring
element that is find the maximum neighbor and replace it with origin of the structuring element.
This process is called as Opening Morphological operation.
Step 5: Subtract the original grayscale image from the result of opening operation this process is
called as Top-hat Transformation.
Step 6: Convert the result of Top-hat Transformation into binary image.
Step 7: Perform the Av1/A, where Av1 is the vein area using structuring element (SE) 2, A is the
leaf area.
Step 8: The leaf vein structure can be extracted using different structuring element like disk shape
structuring element of radius 2,3,4,5 and each vein structure of different structuring element is stored
as feature.
Algorithm2: Algorithm for Fuzzy C means. The algorithm for the Fuzzy C Means clustering [18] is as follows:
Algorithm: The Fuzzy C means algorithm Input:
� C: the number of clusters,
� D: a data set containing n objects.
Output: A set of N clusters.
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 7, July (2014), pp. 71-82 © IAEME
81
Method: Let X = {x1, x2, x3 ..., xn} be the set of data points and V = {v1, v2, v3 ..., vc} be the set of centers.
Step 1: Randomly select ‘c’ cluster centers.
Step 2: Calculate the fuzzy membership 'µ ij' using Eq.(8):
,#-. / 01#-1#234 %567
�8�92.6
Step 3: Compute the fuzzy centers 'vj' using Eq.(9):
:- � 0∑ ,#-% "#&#.6∑ ,#-%&#.63 <= � 1,2, … . B �9�
‘n’ is the number of data points
‘vj’ represents the jth cluster center.
'm' is the fuzziness index m € [1, ∞].
'c' represents the number of cluster center
'µ ij' represents the membership of ith
data to jth
cluster center.
'dij' represents the Euclidean distance between ith
data and jth
cluster center.
Main objective of fuzzy c-means algorithm is to minimize least square error. Least square error can
be calculated using the Eq. (10):
D�E, :� � / / �,#-�%9-.6
&#.6 F "- $ G- F �10�
Where,
'||xi – vj||' is the Euclidean distance between ith
data and jth
cluster center.
Step 4: Repeat step 2) and 3) until the minimum 'J' value is achieved or
||U(k+1)
-U(k)
||<β.
Where,
‘k’ is the iteration step. ‘β’ is the termination criterion between [0, 1].
‘U = (µ ij)n*c’ is the fuzzy membership matrix. ‘J’ is the objective function.
Algorithm4: Naïve Bayesian Classification
Input: Leaf features of training data set Output: Classification of Test leaf Step 1: Apply any clustering method on the training data set to form the cluster
Step 2: Store the input leaf feature into feature vector.
Step 3: Find the probability of each cluster that holds the given leaf image based features. This
probability can be called as posterior probability. Posterior probability can be calculated using Eq.11
)�*#|!� � )�!|*�)�*�)�!� �11�
Where
X: feature vector of the given leaf.
C : Leaf clusters
i: Number of Clusters
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 7, July (2014), pp. 71-82 © IAEME
82
The calculation of the P(C), P(X/C) and P(X) is given in the further steps
Step 4: Calculate the class probability P(C) using Eq.12. It is constant value.
)�*#� � |*#,I|J �12�
Where
D: total number of training tuples in the database. |*#,I| : Number of training tuples of class Ci in D.
Step 5: P(x) will be constant, so to maximize the probability its need to maximize the P(X/C)*P(C).
P(X) can be calculated using Eq.13 as follows:
P(X)=P(Xk | C1)+…..+P(Xk|Ci) (13)
Step 6: In order to reduce computation in evaluating P(X|Ci), the naive assumption of class
conditional independence is made. This presumes that the values of the attributes are conditionally
independent of one another, given the class label of the tuple. Thus,
)�!|*#� � )�"6, " … . "2|*#� � )�"6|*#� � )�" |*#� � … � )�"&|*#�
� K )�"2|*#�&
2.6
Where X is feature vector with {x1, x2, -----, xk} attributes, and k is the total number of attributes.
Step 7: To find the P(X/C) for continuous distribution needs to apply Gaussian distribution, i.e. to
find the probability of the leaf that belongs to the cluster with respect to feature. It involves following
steps. • Calculate mean for each cluster with respect to each feature.
• Calculate variance for each cluster with respect to each feature and the following Gaussian
distribution formula[17] as shown in Eq.14
L�", ,, M� � 1√2OM P5�Q5R�S
TS �14�
Finally probability of leaf belong to particular class is given in Eq.15
)�"2|*#� � LU"2 , ,VW , MV#X (15)
The leaf image which is being tested belongs to particular cluster which has the highest probability.