Ref. code: 25605722300067ONR
SIGNAL PROCESSING FOR ELECTRONIC NOSE
BY
MD. MIZANUR RAHMAN
A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF
THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF
PHILOSOPHY (ENGINEERING & TECHNOLOGY)
SIRINDHORN INTERNATIONAL INSTITUTE OF TECHNOLOGY
THAMMASAT UNIVERSITY
ACADEMIC YEAR 2017
Ref. code: 25605722300067ONR
SIGNAL PROCESSING FOR ELECTRONIC NOSE
BY
MD. MIZANUR RAHMAN
A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF
THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF
PHILOSOPHY (ENGINEERING & TECHNOLOGY)
SIRINDHORN INTERNATIONAL INSTITUTE OF TECHNOLOGY
THAMMASAT UNIVERSITY
ACADEMIC YEAR 2017
Ref. code: 25605722300067ONR
Ref. code: 25605722300067ONR
ii
Acknowledgements
I express my gratefulness to almighty Allah who blessed me to perform
this work. My sincere gratitude is to my advisor Assoc. Prof. Dr. Chalie
Charoenlarpnopparut for the continuous support of my Ph.D study and related
research, for his patience, motivation, and for his knowledgeable guidance throughout
this research. His guidance helped me in all the time of research and writing of this
thesis. I could not have imagined having a better advisor and mentor for my Ph.D
study. My special gratitude is to Dr. Attaphongse Taparugssanagorn and Dr.
Wiroonsak Santipach for their precious and deliberate suggestions which helped me
to improve this work.
Besides my advisor, I am also grateful to my Co-advisor Dr. Prapun
Suksompong for his benevolent and insightful suggestions. I would like to thank the
rest of my thesis committee: Prof. Dr. Banlue Srisuchinwong, Assoc. Prof. Pisanu
Toochinda, Ph.D., and Asst. Prof. Dr. Attaphongse Taparugssanagorn, for their
insightful comments and encouragement.
I am also grateful to all the SIIT faculty members and staffs for every
kind of support during my study and research in SIIT. I also thank my friends in SIIT
for their love and company.
Finally I recall my parents and family members specially my son
Mohammad Ateeb Mahir who sacrificed his father‟s affection and company since
October 25, 2012.
Ref. code: 25605722300067ONR
iii
Abstract
SIGNAL PROCESSING FOR ELECTRONIC NOSE
by
MD. MIZANUR RAHMAN
Bachelor of Science (Electronics and Communication Engineering), Khulna
University, 2002
Master of Science (Engineering and Technology), Sirindhorn International Institute of
Technology, Thammasat University, 2014
Doctor of Philosophy (Engineering and Technology), Sirindhorn International
Institute of Technology, Thammasat University, 2017
Ripeness identification of fruits with short lifetime is important to benefit
both the cultivators and consumers. Recently electronic noses (E-Noses) have become
popular for fruit quality checkup for its sturdiness and repetitive usability without
fatigue dissimilar to human experts. The primary components of an E-Nose are a data
acquisition device, a sensor panel and a classification algorithm. Most sensors which
are used for E-Noses are expensive. In addition a sensor panel with large number of
sensors increases design complexity. Thus to find a minimal set of sensors with
maximum relevant data classification efficiency is of vital importance. To analyze the
classification efficiencies of different classification methods fruits, such as banana,
mango, sapodilla, and pineapple are chosen. Two novel methods for finding a
minimal set of sensors are proposed in this thesis. One is a principal component
loading and mutual information based approach, and the other is a threshold based
approach. With these methods minimal set of sensors are found which show more
than 90% classification accuracy while classifying each of the four fruit types at three
ripeness states. Once a sensor panel is designed and a data acquisition device chosen,
a simple, fast, efficient classification method is required for classifying data of
Ref. code: 25605722300067ONR
iv
relevant training classes, and to reject any irrelevant data. At present to classify E-
Nose data, k-nearest neighbor (k-NN), support vector machine (SVM) and multilayer
perceptron neural network (MLPNN) classification algorithms are often applied. Due
to open ended hyperplane based classification boundaries, these algorithms falsely
classify extraneous odor data. For reducing false classification error and thereby
improve correct rejection performance classification algorithms with hyperspheric
boundary such as generalized regression neural network (GRNN) and radial basis
function neural network (RBFNN) should be used. Simulation results show that
GRNN has better ability to overcome false classification problem compared to
RBFNN. For large number of neurons requirement, designing a GRNN is complex
and expensive. A simple hyperspheric classification method based on minimum,
maximum, and mean (MMM) values of the training data is also proposed in this
thesis. It is observed that the MMM algorithm is simpler, faster, and have higher
accuravy for classifying data of training classes and correctly rejecting data of
irrelevant classes.
Keywords: Electronic nose, Pattern recognition, Signal processing, False alarm,
Classification
Ref. code: 25605722300067ONR
v
Table of Contents
Chapter Title Page
Signature Page i
Acknowledgements ii
Abstract iii
Table of Contents v
List of Figures viii
List of Tables x
1 Introduction 1
1.1 Problem statement 1
1.2 Motivation 3
1.3 Objective 5
1.4 Contribution of this thesis 5
2 Background Study and Literature Review 7
2.1 Human olfactory system 7
2.2 Human nose versus E-Nose 9
2.3 Literature review 10
2.3.1 Sensor array minimization techniques 13
2.3.2 False alarm reduction techniques 14
3 System Design and Methodology 16
3.1 E-Nose design process 16
3.2 Sample fruit collection 17
Ref. code: 25605722300067ONR
vi
3.3 Sensor array 17
3.4 Experimental setup 21
3.5 Data acquisition system 21
3.6 Pattern recognition and classification algorithms 24
3.6.1 Principal component analysis (PCA) 24
3.6.2 k-nearest neighbor (k-NN) 25
3.6.3 Support vector machine (SVM) 25
3.6.4 Multilayer perceptron neural network (MLPNN) 27
3.6.5 Generalized regression neural network (GRNN) 29
3.6.6 Radial basis function neural network (RBFNN) 30
3.6.7 Linear discriminant analysis (LDA) 31
3.7 Sensor panel minimization techniques 33
3.7.1 Exhaustive search method 33
3.7.2 PCA loading and mutual information based approach 34
3.7.3 Threshold based approach 36
3.8 Hyperplane versus hyperspheric classification 37
3.8.1 Minimum-maximum-mean based hyperspheric classification 40
4 Results and Discussions 43
4.1 Data preprocessing 43
4.2 Sensor panel minimization methods 46
4.2.1 Between to within variance method 46
4.2.2 Exhaustive search method 47
4.2.3 PCA loading and mutual information based approach 50
4.2.4 Threshold based method 51
4.3 False classification reduction: Hyperplane versus hyperspheric 53
4.3.1 GRNN compared to SVM, LDA, k-NN, and MLPNN 53
4.3.2 MMM versus k-NN, SVM, GRNN, RBFNN, and MLPNN 57
5 Conclusions and Recommendations 63
5.1 Conclusions 63
5.2 Future works 64
Ref. code: 25605722300067ONR
vii
References 65
Appendices 76
Appendix A 77
Ref. code: 25605722300067ONR
viii
List of Figures
Figures Page
1.1 (a) Hyperspheric (ellipsoid in two dimension) vs. (b) hyperplane (line in
two dimension) classification boundary 3
2.1 Position of a human olfactory system 7
2.2 A human olfactory system and its parts: 1: Olfactory bulb, 2: Mitral cells,
3: Bone, 4: Nasal Epithelium, 5: Glomerulus, 6: Olfactory receptor cells. 8
2.3 Detailed view of a human olfactory system 8
2.4 Comparison of Human olfaction system to an E-Nose olfaction system 10
3.1 An E-Nose design process 16
3.2 Fruit samples at different ripeness states: (a) unripe banana, (b) ripe
banana, (c) rotten banana, (d) unripe mango, (e) ripe mango, (f) rotten
mango, (g) unripe sapodilla, (h) ripe sapodilla, (i) rotten sapodilla, (j)
unripe pineapple, (k) ripe pineapple, and (l) rotten pineapple 18
3.3 The E-Nose sensor array. (a) Sensors mounted on a breadboard, (b) PCB
design. 5 sensors are of TGS 26XX series and 3 sensors are of TGS
8XX series. Cermet type variable resistors are used. 19
3.4 The E-Nose experimental setup. Valves 1 and 2, and fans 1 and 2 are for
air flow control, valves 3 and 4, and fans 3 and 4 are for circulating the
odor between sample chamber and measurement chamber. A E-Nose
power supply powers the sensors, fans, and the myRIO 21
3.5 LabVIEW schematic diagram for data acquisition 23
3.6 Block diagram of a multilayer perceptron neural network 27
3.7 Generalized regression neural network block diagram. d2
l,m is Euclidean
distance, is the spreading factor, m is class index, and l is data index
within class m 29
3.8 Two dimensional PC loading of the fruits data to explain relation of PC
loading to mutual information 35
3.9 Gaussian functions with means, µ=0.61, µ=1.30, µ=1.90 and standard
deviations =0.04, =0.22, =0.34 for green mango, ripe banana, and
ripe sapodilla, respectively 37
Ref. code: 25605722300067ONR
ix
3.10 A PCA of the dataset to explain false classification problems of k-NN,
SVM, LDA, and MLPNN. The dataset is a PCA scores plots of the four
fruit types each at three ripeness states. In the figure GB is unripe
banana, RB is ripe banana, RtB rotten banana, GM is unripe mango, RM
is ripe mango, RtM is rotten mango, GS is unripe sapodilla, RS is ripe
sapodilla, RtS is rotten sapodilla, GP is unripe pineapple, RP is ripe
pineapple, and RtP is rotten pineapple 38
3.11 A PCA plot with closed boundary to explain the effect of hyperspheric
closed boundary to avoid false classification. In the figure GB is unripe
banana, RB is ripe banana, RtB rotten banana, GM is unripe mango, RM
is ripe mango, RtM is rotten mango, GS is unripe sapodilla, RS is ripe
sapodilla, RtS is rotten sapodilla, GP is unripe pineapple, RP is ripe
pineapple, and RtP is rotten pineapple 39
4.1 Sensor responses from eight sensors for ripe sapodilla 44-45
4.2 PCA scores plot of training data of four fruit types at three ripeness states 45
4.3 Between to within variance of all the classes for each sensor 46
4.4 Signature patterns for three types of fruits. (a) Ripe mango, (b) ripe
sapodilla and (c) ripe pineapple 54
4.5 Scores plot of three types of fruits 54
4.6 Classification of ripe mango samples, and test samples of sapodilla and
pineapple by exact GRNN and approximate GRNN at different
spreading factor 57
4.7 Signature patterns of the means of three types of fruits at three ripeness
states: (a) green banana, (b) ripe banana, (c) rotten banana, (d) green
sapodilla, (e) ripe sapodilla, (f) rotten sapodilla, (g) green pineapple, (h)
ripe pineapple and (i) rotten pineapple 58
4.8 Box plots showing minimum maximum values of the sensor variables for
the training classes i.e. green banana (GB), ripe banana (RB), rotten
banana (RtB), green sapodilla (GS), ripe sapodilla (RS), rotten sapodilla
(RtS), green pineapple (GP), ripe pineapple (RP), and rotten pineapple
(RtP). 59
Ref. code: 25605722300067ONR
x
List of Tables
Tables Page
1.1 Total fresh fruit import value for mainland china by country, 2012–2015 4
3.1. Figaro gas and VOC sensors used in the E-Nose design 20
3.2. Experimental data collection plan for fruit odor classification 22
4.1 Pattern recognition error rates of test data for between to within variance
method 47
4.2 Classification error by MLPNN and RBFNN methods. For different
number of sensor cases, the combinations with minimum pattern
recognition errors are recorded 48
4.3 Classification error by k-NN and SVM methods. For different number of
sensor cases, the combinations with minimum pattern recognition errors
are recorded 49
4.4 Principal component loadings 50
4.5 Mutual information between pairs of sensors. The self-information in
diagonal cells is omitted 51
4.6 Maximum number of pairs of classes classifiable by combinations of
three sensors 53
4.7 Maximum number of pairs of classes classifiable by combinations of four
sensors 53
4.8 Classification of ripe mango, ripe sapodilla and ripe pineapple samples
by different algorithms 56
4.9 Training and testing time taken by the algorithms to train and test with
the data samples of banana, sapodilla, and pineapple, each at three
ripeness states 60
4.10 Misclassification error and correct classification rate of the
classification algorithms while testing with test data set from training
classes i.e. banana, sapodilla, and pineapple each at three ripeness states 61
4.11 False classification performance of the algorithms with irrelevant data
(i.e. mango odor data at three ripeness states) 62
Ref. code: 25605722300067ONR
1
Chapter 1
Introduction
1.1 Problem statement
The sense of smell is a fundamental sense for humans as well as animals.
The vertebrates and other organisms has olfactory receptors in the olfaction system to
identify foods, predators, mates, and provides sensual pleasure of the good smells as
well as warnings of food quality, chemical dangers, ripeness of fruits, etc. A smell is
discerned by humans and animals based on a sensitivity pattern generated by the
olfactory sensory neurons without identifying individual chemical components within
odors. These neurons are located into their noses. Whereas traditional chemical or
analytical instruments such as chemical analysis, gas chromatography or mass
spectrometry (GC/MS) instead of deciding object type or quality, finds the
composition of the volatile organic compounds (VOCs) in the odor. On the other hand
human experts suffer from fatigue. As a result the concept of an electronic nose (E-
Nose) was initiated in 1982 by Persaud and Dodd to mimic a human or animal nose
[1].
The gas sensors used in E-Nose application are expensive and have
distinct sensitivity to wide ranges of gases and VOCs. A specific odor or VOC can be
sensed by multiple sensors with different sensitivity. The wide range sensitivity of the
sensors gives us the idea to exclude redundant sensors from the sensor panel and
reduce the design complexity and cost. However finding and excluding the redundant
sensors without significant degradation of an E-Nose performance is a challenge.
A proficient, quicker, and simpler classification algorithm which
classifies types of target object odors with an acceptable error rate is required for E-
Nose training. When odor data, with which an E-Nose is trained, is classified to an
expected training class correctly, it is called “correct classification”. On the other
hand, odor data of an untrained or irrelevant class should be correctly rejected and
should not be classified to any training class, and thereby should not produce false
classification error, called “false alarm”. In short, data from training classes are
required to be correctly classified and from irrelevant odor classes are to be correctly
Ref. code: 25605722300067ONR
2
excluded. Incorrect classifications of the data which are truly are of training classes or
irrelevant classes‟ outcomes in classification errors. These classification errors can be
categorized as follows:
i) misclassification: if odor data of training classes are classified to
another training or irrelevant class, and
ii) false classification: if odor data of an unrelated class classified to a
known training class. False alarm is caused if any false classification occurs.
Thus along with good classification accuracy (correct classification rate),
less processing time and better simplicity; false classification rate as well as correct
rejection rate should are to be considered as an essential conditions to choose a
classification method for E-Nose applications.
The classification methods such as principal component analysis (PCA),
k-nearest neighbor (k-NN), support vector machine (SVM), multilayer perceptron
neural network (MLPNN), radial basis function neural network (RBFNN),
generalized regression neural network (GRNN), and linear discriminant analysis
(LDA), etc. are commonly used for E-Nose data classification. Among these methods
k-NN, SVM, MLPNN, and LDA classify data by open ended hyperplane based
classification boundaries and are susceptible to false classification and thereby
generate false alarm in presence of irrelevant odor due to wideness of sensor
sensitivity. Classification methods such as RBFNN and GRNN which use Gaussian
activation functions in their hidden layer generate bounded hyperspheric classification
boundary and is able to cope with the false alarm issue. Due to large number of
neuron requirements application of these methods are expensive in presence of large
training dataset. Thus a simple classification method is required for E-Nose
application.
E-Nose data does not extend indefinitely into the space rather acquired
sensor data of any odor varies within definite limit. The feature variables of an odor
class for an E-Nose are distributed around a mean (or mean vectors in
multidimensional case) within distinct limits. Any data which are located in the empty
spaces around different class boundaries and are not reasonably close to any class
should not be classified into any training classes rather should be rejected. This
objective is achievable by a classification algorithm which uses hyperspheric
Ref. code: 25605722300067ONR
3
classification boundary. The effect of hyperspheric versus hyperplane classification
boundaries is shown in Figure 1.1. Hyperspheric classification boundaries (ellipsoid
in two dimensions) always form closed boundaries around classes (e.g. class
boundaries D and E having centers „d‟ and „e‟ in Figure 1.1(a)) and classify data to a
class if it resides within the class boundary. Whereas hyperplane classification
boundaries (lines in two dimensions) are unbound and classifies data to a class
regardless of its distance from class D or E having centers at „d‟ or „e‟ (Figure 1.1(b)).
Let us consider a data point „f‟ which is far from the class centers „d‟ and „e‟ in both
Figure 1.1(a) and Figure 1.1(b). The hyperspheric classification boundaries (Figure
1.1(a)) will not classify the data point „f‟ to either class D or E, while it is seen that
the data point „f‟ will be classified to class E (Figure 1.1(b)) by a hyperplane based
classification algorithm.
Figure 1.1 (a) Hyperspheric (ellipsoid in two dimension) vs. (b) hyperplane (line in
two dimension) classification boundary.
1.2 Motivation
In the growth and development of human life fruits and vegetables have
always been very essential. Fruit benefits us a healthy lifestyle by giving
carbohydrates, fiber and micro-nutrients which aids our bodies to function properly.
Ref. code: 25605722300067ONR
4
Fruits give more energy than sugar or sweets as they contain natural glucose and
fructose.
As food value chains become longer and more complex, quality and
safety of foods and fruits are becoming an increasingly important issue for consumers
as well as for governments. Fresh foods and fruits need to meet food safety
requirements and marketing standards. National and international buyers also use
their own quality specifications. Fresh foods and fruits are perishable products.
Quality of these products deteriorates faster which causes fall of prices. Table 1.1
shows that Thailand is the top exporter of fruits to China. To keep this position and
finding more opportunities in other countries quality and standards should be
maintained which imply that fruits should be plucked and distributed at their right
maturity level. As human experts suffer from fatigue and are expensive as well an E-
Nose can be used instead for fruit type and quality identification.
Table 1.1 Total Fresh Fruit Import Value for Mainland China by Country, 2012–2015
[2].
Country Imported fresh fruit by China in millions of USD in the years
2012 2013 2014 2015
Thailand 975.6 1106.2 1022.8 1066.3
Chile 571.1 615.6 776.1 971
Vietnam 441.5 546.3 682.2 861.4
Philippines 320.3 315.3 607.0 564.8
USA 288.1 253.8 253.1 290.6
Peru 66.8 98.4 202.6 214.3
Ecuador 31.0 19.9 154.7 220.5
South Africa 64.5 88.0 156.3 140.5
New Zealand 117.5 108.5 153.9 274.9
Australia 22.5 46.8 71.2 114.5
Four popular and healthy fruits namely banana, mango, pineapple,
sapodilla are chosen for this research. Due to fast metabolism process these fruits rot
Ref. code: 25605722300067ONR
5
faster once they are plucked. Identification of these fruits at the perfect maturity level
can benefit farmers, businessmen, and consumers.
Designing an E-Nose includes knowledge on electronic circuit design i.e.
designing a sensor panel with gas and/or VOCs sensors, interfacing the sensor panel
with a data acquisition device, digital signal processing, and pattern classification. As
an Electronics and Communication Engineer the author is inspired to deeply learn and
accumulate this knowledge to design and explore the working process of an E-Nose
and contribute in ongoing researches in this area. This helps to understand and find
solution to the existing problems. Furthermore an E-Nose is not only a laboratory
device anymore, it has practical applications to analyze food quality, fruit ripeness,
hazardous chemicals or gases detection, etc. An E-Nose designed for fruit ripeness
detection benefits both the farmer and consumer by identifying the proper state of a
fruit to pluck or consume the fruits at the best time before they are rotten. For mass
availability designing an E-Nose with minimum number of sensors, maximum correct
classification, and minimum false classification performance is essential.
1.3 Objective
The objectives of this thesis include:
i. to design an E-Nose with gas sensors to identify ripeness of
tropical fruits,
ii. to develop redundant sensor exclusion method to reduce E-Nose
design complexity and cost.
iii. to find a classification algorithm with irrelevant data rejection
capability and thereby reduce false alarm in E-Nose.
1.4 Contribution of this thesis
In this thesis a sensor panel has been designed with eight metal oxide gas
(MOG) sensors which efficiently classify four tropical fruits namely, banana, mango,
sapodilla, and pineapple each at three ripeness states namely unripe, ripe, and rotten.
Later two novel methods for finding an optimal set of sensors are proposed which are
less complex and more efficient compared to existing methods. One is a principal
component loading and mutual information based approach, and the other is a
Ref. code: 25605722300067ONR
6
threshold based approach. An optimal set of three sensors are found by the proposed
methods which performance is tested with RBFNN, MLPNN, k-NN and SVM
classification algorithms and show minor classification errors.
The term „false classification‟ is not addressed for E-Nose application
except fire alarm detection in literature. It is presented that an RBFNN and a GRNN
with Gaussian activation function at the hidden layer neurons produce bounded
hyperspheric classification boundary around the training classes. The GRNN method
is found to be more efficient to reduce false classification as well false alarm by
rejecting any irrelevant odor data. In addition a simple hyperspheric classification
method which is based on minimum, maximum, and mean (MMM) values of the
features of the training data is proposed in this paper. It is shown that this MMM
algorithm is simpler, faster and have higher accuracy to classify odor data of training
classes and correctly rejects data of extraneous classes.
In summary, this thesis shows the following contribution:
i. design of a sensor panel which classifies four fruits each at three
ripeness states,
ii. two novel methods for sensor panel minimization,
iii. hyperspheric boundary based classification methods are better for
E-Nose data classification,
iv. a novel hyperspheric classification method for E-Nose data
classification based on minimum, maximum, and mean of the
features of each class.
Ref. code: 25605722300067ONR
7
Chapter 2
Background Study and Literature Review
2.1 Human olfactory system
Humans and animals sense the smell by the in-built sensory system into
their noses called olfactory system. The olfactory system has two distinct parts such
as:
a main olfactory system: used for detecting volatile, airborne substances, and
an accessory olfactory system: senses fluid-phase stimuli.
The mechanism of the olfactory system can be divided into
a peripheral one: external stimulus is sensed and encoded as an electric signal
in neurons, and
a central one: all signals are assimilated in the central nervous system.
Figure 2.1 Position of a human olfactory system. [3]
Ref. code: 25605722300067ONR
8
Figure 2.2 A human olfactory system and its parts: 1: Olfactory bulb, 2: Mitral cells,
3: Bone, 4: Nasal Epithelium, 5: Glomerulus, 6: Olfactory receptor cells. [4]
Figure 2.3 Detailed view of a human olfactory system. [3]
Position of a human olfactory system is shown in Figures 2.1 and 2.2, and
a detail view of human olfactory system is shown in Figure 2.2 and 2.3. The main
olfactory system of the humans detects odorants i.e. VOCs that are inhaled through
the nose from the atmosphere. Objects emitting VOCs into the air can be recognized
by the humans and animals from the past odour experiences recorded into their
Ref. code: 25605722300067ONR
9
memory. The VOCs reach the olfactory sensory cells at the olfactory Epithelium
located into the inner most part of the nose. The detection is performed with a small
area which is about 2.5 square centimeters. VOCs reception and sensory transduction
starts at the mucous-cilia layer. The mucous layer is approximately 60 microns thick.
8-20 whip-like cilia of 30-200 microns long are connected to the main olfactory
receptor neurons (also called olfactory receptor cells) in olfactory epithelium. The
olfactory epithelium contains approximately 50 million various sensory receptor cells.
Olfactory receptor neurons transduce VOCs‟ sensation at the cilia into electrical
signals. On the other side olfactory receptor cells form axons within the epithelium.
10-100 axons are bundled in groups, penetrate the cribriform bone, converge and
terminate to form synaptic structures called glomeruli. Glomeruli are located at the
olfactory bulb. Identical signals from different olfactory receptors combine at the
glomeruli to which they are connected and travel along the olfactory nerve which
terminates in the olfactory bulb and also belongs to the central nervous system. A new
odor and its concentration are classified by a complex set of olfactory receptors
neurons. The central nervous system views the odors as distinct neural activity
patterns. The data remain into the memory as synaptic weights for future
identification of the similar VOCs and identifying an object [1].
2.2 Human Nose versus E-Nose
An E-Nose is a measurement unit that attempts to imitate the human
olfactory system to identify odors or flavors. An E-Nose instigates complex multi-
dimensional data for each measurement, and the data is interpreted and mapped to a
target value or class label by a pattern recognition technique. The stages of the
recognition process of an E-Nose are similar to human olfaction. The olfaction
processes are performed for identifying, comparing, quantifying type or quality of
odor emitting objects and other applications. A comparison of human olfaction to that
of an E-Nose is shown in Figure 2.4.
As seen in Figure 2.4 the olfaction process for an E-Nose and a human
nose begins from an odor source. Similar to a human olfactory receptor an E-Nose
uses a sensor array and acquisition device to generate and collect signal in response to
Ref. code: 25605722300067ONR
10
an odor type. A signal processing algorithm preprocesses the signal to a suitable
format as required by the classification algorithm. An implemented classification
algorithm such as, k-NN, SVM, MLPNN, RBFNN, GRNN, and LDA just to name a
few compares the presently sensed data to the previously stored data and provides
decision on an odor or its source type.
Figure 2.4 Comparison of Human olfaction system to an E-Nose olfaction system.
2.3 Literature review
In this section firstly different applications of E-Noses and classification
methods in literature are presented, later existing methods of sensor array
minimization techniques and false alarm reduction methods are discussed.
Beverages, such as orange juice, mango juice, and blackcurrent juice;
spoiled and fresh milks, pasteurized milk, milk processed with ultrahigh temperature
were classified by PCA and MLPNN in [5] by an E-Nose. Biodiesels, such as
babassu, beef tallow, palm, and chicken grease were classified in [6] also by PCA and
MLPNN. Eastern and north-eastern Indian black tea was classified by MLPNN in [7],
in [8] for fruit (peaches, pears, and apples) ripeness (green, ripe, and overripe)
determination, in [9] for identification of ammonia in waste water. In [10] three
separate experiments were performed to classify four types of tea; five types of
coffee; and water, ethanol, triethyl-amine, and methyl-salicylate using MLPNN.
MLPNN was also applied in [11] and [12] for classification. In [11] it was used to
classify fragrant herbs species such as: lemongrass (cymbopogen citrates), curry
Ref. code: 25605722300067ONR
11
(murraya koenigii), pandan leaves (pandanus amaryllifolius), kaffir lime/limau purut
(citrus hystrix), and golden lime/limau Kasturi (citrus microcarpa) and in [12] to
classify coffee, tea, and cocoa. In [13] three concentration levels of gasoline, benzene,
xylene, ethyl benzene, and toluene were classified by PCA and GRNN. Twenty diesel
fuels were classified by LDA and PCA in [14]. PCA, probabilistic neural networks
(PNN) were applied to classify and monitor the quality of diesel and gasoline fuel in
[15]. A review on the classification of foods and beverages, such as grains, fish,
alcoholic drinks, fruits, meat, non-alcoholic drinks, milk, and dairy products, fresh
vegetables, eggs, olive oils, and nuts were presented in [16] partial least squares
(PLS), cluster analysis (CA), PCA, and MLPNN were used as classification methods.
In [17] and [18] correlation based analysis of E-Nose data were performed for beef
freshness, and beverages (black-current juice, orange juice and soy milk)
identification, respectively. An E-Nose used a k-NN method to classify lemon,
banana, and litchi fragrances in [19]. Five different tea samples of different qualities
were classified by RBFNN, fuzzy C means, and self-organizing map (SOM) in [20].
Another E-Nose designed with MOG sensors used PCA to analyse the quality of soft
drink, olive oil, and tomato pulp [21]. In [22] ripening shelf-life of peaches,
nectarines, apples, and pears were monitored by an E-Nose designed with MOG
sensors. It was shown by PCA analysis that fruit ripeness assessment by E-Nose is
reliable. A MOG sensor based E-Nose was proposed in [23] where MLPNN was
applied for classification and detection of apple ripeness. Machine vision system,
near-infrared spectrophotometer, and an E-Nose system were combined together in
[24] and few numbers of misclassification were observed. Here the E-Nose was used
to identify the rotting stage of apple by MLPNN. In [25] concentration estimation of
formaldehyde based on chaotic sequence optimized MLPNN was done. Basal Stem
Rot disease of oil palm was identified with application of ANNs such as: MLPNN,
PNN, and RBFNN in [26]. In [27] an E-Nose with five semiconductor thin film
sensors was designed. Here two groups of seven coffees‟ quality were analysed by
training an MLPNN. Before feeding the data to MLPNN, dimensions of the data were
reduced by PCA. MOG sensors were used to develop an E-Nose to recognize
benzene, toluene, ethyl benzene, xylene, and gasoline based on PCA, and ANN [13].
In [28] dimensionality was reduced by PCA and then a neural network was used to
Ref. code: 25605722300067ONR
12
discriminate different types of coffee, aroma oils, perfumes and alcohol. Six clonal
varieties of orthodox finished tea were classified by PCA and MLPNN [29]. PCA was
applied to visualize data in lower dimension and PNN was applied for classification in
[30]. Collected data from MOG sensor based E-Nose were analyzed by MLPNN, and
SVM to classify shelf-life stages of banana [31]. An SVM was also applied in [32] to
recognize ethanol, gasoline and acetone with good accuracy and in [33] to identify
spoiled beef, and fish. Eleven different types of Spanish olive oils were classified by
PCA followed to an LDA projection with 79% accuracy in [34]. A MOG sensor based
E-Nose was designed in [35] with wireless connectivity where k-NN, MLPNN, SVM
with linear and RBF kernel, PCA, and LDA algorithms were used for classification.
Coffee roasting degree was analysed in [36] by applying two dimensional PCA scores
to MLPNN, and GRNN. In [37] a portable E-Nose was used to collect sensing data
and pattern recognition algorithms as LDA, MLPNN, PNN, and GRNN were used to
identify adulteration of sesame oil. Applications of an E-Nose in determining fruit
quality and ripening have been studied in the literature. PCA and MLPNN were used
in [38] to study the quality and ripening of peach, pear, and apple. Tomato aroma
profile was studied with the aid of PCA and an E-Nose in [39]. Maturity and storage
shelf life of tomato were studied by a portable E-Nose named „PEN 2‟ in [40] and
[41], respectively, where data analysis and classification were done by PCA and
LDA. Analysis of the ripening process of pinklady apples during shelf life was done
in [42] by PCA and Fuzzy adaptive resonance theory map. The partial least square
method was applied to predict apple harvesting data in [43]. PCA was applied to
analyse the shelf life of apples from GS-MS and E-Nose data in [44]. The non-
destructive quality of “Fuji” apples was analysed by fusion of three different sensors,
namely: a NIR, a machine vision system, and an E-Nose based on MLPNN in [45].
To monitor mandarin picking, a PEN2 E-Nose was used in [46], where data analysis
was done by PCA and LDA. Four peach cultivars were classified and peach ripening
stages were assessed by a commercial E-Nose named PEN2, based on PCA and LDA
in [47]. The quality of post-harvested oranges and apples were analysed by thickness
shear mode quartz resonators, based on PCA and PLS in [48]. An embedded E-Nose
was designed in [49], which showed good response to onions and oranges. An ANN,
Ref. code: 25605722300067ONR
13
trained with back-propagation, and an artificial bee colony algorithm, was applied to
classify strawberry, lemon, cherry, and melon [50].
2.3.1 Sensor array minimization techniques
Reducing an E-Nose manufacturing cost and computational complexity is
important for mass production which can be achieved by excluding the sensors with
insignificant information from an E-Nose sensor panel. Various methods have been
proposed by the authors in literature.
A sub-array based sensor optimization technique was presented in [51],
where, at first, sensor sub-arrays were formed by combining sensors with similar gas
discriminating capability and later, sensors from the sub-arrays were chosen as per
classification capability. Another method named combination optimization method
was used in [52] to discriminate different grades of Longjing tea. This method is
based on hypothesis testing by analysis of variance (ANOVA) at the beginning and
later principal component (PC) loading analysis is utilized to reduce the number of
sensors. The PC loading value method was also used in [53] to exclude redundant
sensors as [52]. It was also shown that ANOVA and Tukey multiple comparison
suggest similar choice of the sensors. But the Tukey multiple-comparison is mainly
based on distance between means of each sensor data considering all the classes in the
same group. Here each sensor‟s capability to classify classes is not considered. In
addition, [52] and [53] did not show how to use this method for the cases where
higher dimensional PCs might contain significant data variance.
A genetic algorithm (GA) was used in [54] to optimize an E-Nose sensor
array and determine the tea quality. For GA an individual gene is represented by a
sensor. Genes are combined to construct chromosomes. A number of such
chromosomes (combinations of sensors) form a population. GA technique aims to
reduce the number of sensors (gene) in the combination which limit the length of the
constructed chromosome. The fixed chromosome length overlooks the longer or
shorter chromosomes which can mislead to suboptimal set of sensors. To overcome
this problem, one way is to perform exhaustive search within all combinations of
sensors and apply a classification algorithm to find the classification errors. The
optimal sensor set is the one which meets the acceptable pattern recognition errors
Fuzzy
ARTMAP
Ref. code: 25605722300067ONR
14
(set by the designer or application) with minimum number of sensors. In addition,
global optimum cannot be achieved all the time with GA, especially when overall
solution has various populations. In addition, global optimum is not achievable all the
time with GA. To achieve optimum solution uncertain number of trials is required
[55]. The ratio of between variance to within variance [56] for individual sensors over
all classes was applied as a method to choose the optimal number of sensors for an E-
Nose to identify ethanol, 2-propanol, acetone, and ammonia [57]. With this method
optimal set of sensors were chosen as per to the higher ratio of inter-class variation to
within-class variation of each sensor. The performance of the optimal sensor set was
verified by a three layer MLPNN. This method did not consider similarity or
dissimilarity i.e. mutual information contained in the selected sensors which might
mislead to choosing alike sensors.
2.3.2 False alarm reduction techniques
Until present, few researches addressed “false classification” or “false
alarm” for E-Nose applications. The issue was addressed for fire detection in [58-60].
Threshold based approach was used by the authors in [61] to detect false alarm for
concentration estimation and chemical agent detection. The false alarm reduction
performance of an E-Nose in presence of irrelevant gases or VOCs is not considered.
A false alarm is likely to originate if an E-Nose comes in contact with any irrelevant
gases or VOCs due to the sensor responses to wide ranges of gases or VOCs.
Classification algorithms, such as PCA [39, 41, 42, 44, 46-48, 62-66], k-
NN [19, 65-67], SVM [67-72], GRNN [13], RBFNN [73-77], MLPNN [63-66, 75-
79], and LDA [34,35,37,41,46,47], etc. just to name a few are commonly applied for
odor classification by different E-Noses. SVM, LDA, and MLPNN separate the
classes by unbound classification boundaries, i.e., lines in two dimensional space,
planes in three dimensional space, and hyper-planes in higher dimensional space. The
generated classification boundaries by lines, planes or hyperplanes are similar to
Voronoi diagrams, where outer classes have open boundaries for multiclass problems.
For two or three class problems the boundaries are always open. In this thesis it is
shown that the classification algorithms having open ended classification boundaries
Ref. code: 25605722300067ONR
15
are not suitable to reduce false alarm. The classification boundaries produced by
Gaussian activation functions for RBFNN and GRNN are hyperspheric. A Gaussian
activation function produces boundary around the training classes by deemphasizing
far data and emphasizing near data from the mean. Thus the classification boundaries
produced by RBFNN and GRNN are bounded in two, three, or higher dimensions. For
more than three dimensions they produce hyperspheric classification boundary. The
RBFNN and GRNN training is fast but their design complexity is high as their
required number of neurons is to the order of number of training data samples. A less
complex hyperspheric classification algorithm is proposed in [80]. This method
defines the hypersphere in an n-dimensional space having center c = (c1, c2, …, cn)
and radius Rn as
n
iii n
Rcx1
22, (2.1)
where an unknown n-dimensional data x = (x1, x2, …, xn) is then classified as follows.
If
n
iii n
Rcx1
22, (2.2)
the class 1 is decided, otherwise, class 2. Thus, this method is not suitable to classify
more than two classes. An inversion algorithm for MLPNN shown in [81] generates
bounds training data with closed boundary can classify two classes of data where one
class is inside the boundary and the other outside the boundary. A Gaussian kernel
based hyperspheric decision boundary is applied for one class classification by SVM
[82]. We observe that the classification methods in [80-82] are suitable for one class
classification or novelty testing or outlier detection, and are not fitted to classify more
than two classes.
In the next chapter the designing process of the E-Nose designed for this
thesis is explained. The existing popular pattern recognition techniques and sensor
array minimization methods are also presented in contrast to the proposed methods of
sensor array minimization and false alarm reduction techniques.
Ref. code: 25605722300067ONR
16
Chapter 3
System Design and Methodology
In this chapter E-Nose design procedure, sample collection, designing a
sensor array, data acquisition system, experimental setup, existing classification
methods and sensor array minimization techniques, and proposed methods for sensor
array minimization techniques and hyperspheric classification to reduce E-Nose false
alarm are presented.
3.1 E-Nose design process
Figure 3.1 An E-Nose design process.
The E-Nose design process is presented in Figure 3.1. Odour samples are
prepared and kept in the sample chamber. The odour source can be foods, fruits,
chemicals, explosives etc. This research is performed on fruit odours to classify fruits
and their ripeness states. The odour samples are allowed to enter from the sample
chamber into a measurement chamber that consists of a sensor array and data
acquisition device powered by external E-Nose source. The sensor array comprises
sensors with wide and dissimilar selectivity. Odour VOCs are adsorbed at the sensor
Ref. code: 25605722300067ONR
17
surface and causes physical change of the sensor behaviour (resistivity for gas
sensors). In response to an odour the current flowing through the gas sensors as well
as the voltage developed across the sensor load resistors change. Responses are
acquired by data acquisition device transforming the signal into a digital value and
produce signature patterns for different kinds of objects. The data is collected and
recorded by a wifi connected computer from the acquisition device. Recorded data are
then pre-processed based on statistical methods. Data pre-processing means to convert
the sensed signature patterns to a suitable format by reducing the dimensionality of
the measurement space, extract information relevant for pattern recognition by
applying PCA, LDA, etc. so that the data can be applied to pattern recognition and
classification algorithms for classification. For classification k-NN, MLPNN,
RBFNN, GRNN, SVM, PCA, and LDA are applied in this thesis. In decision making
stage the decisions such as odour class, concentration, unidentified etc. are mapped to
the results from pattern recognition and classification stage.
3.2 Sample fruit collection
Sapodilla, pineapple, banana, and mango shown in Figure 3.2 are selected
for this research as their ripening process is fast, and they rot soon, which may cause
loss to a business or customer. Four sapodillas, one pineapple, two bananas, and a
mango are kept in turn in the sample chamber during experiment with each type of
fruit. Throughout the experiment the fruits are preserved at 28oC. Separate
impermeable boxes are used to store the fruits to prevent their odour mixing with each
other and thereby noise induction with each other.
3.3 Sensor Array
For this research eight MOG sensors which have a wide range of
sensitivity to a variety of gases and VOCs are purchased. The sensors are TGS2612
(S1), TGS821 (S2), TGS822 (S3), TGS813 (S4), TGS2602 (S5), TGS2603 (S6),
TGS2620 (S7) and TGS2610 (S8) as listed in Table 3.1. S1 to S8 are indices assigned
to the sensors. A sensor panel for the E-Nose is designed on a breadboard as shown in
Figure 3.3(a) using the sensors, and later the sensor panel is implemented on a PCB as
Ref. code: 25605722300067ONR
18
shown in Figure 3.3(b). The sensor panel is sensitive to methyl mercaptan, trimethyl
amine, hexane, acetone, ammonia, benzene, carbon monoxide, hydrogen sulphide,
hydrogen, butane, acetylene, ethylene, and propane.
The MOG sensors used for this research are conductivity sensors. Basic
construction material of these sensors is tin di-oxide (SnO2). Resistance of these
sensors is high in the presence of free air or oxygen. In contact with target VOCs or
gases resistivity of the sensors decreases, this causes increase of the sensor current as
well as the load current. As a result, the voltage across the load resistor increases
proportionately to the VOCs concentration levels. This voltage signals across the load
resistors of the sensors are recorded. The circuit voltage and heater voltage of the
sensors are set to 5 volts dc throughout the experiment.
Figure 3.2 Fruit samples at different ripeness states: (a) unripe banana, (b) ripe
banana, (c) rotten banana, (d) unripe mango, (e) ripe mango, (f) rotten mango, (g)
unripe sapodilla, (h) ripe sapodilla, (i) rotten sapodilla, (j) unripe pineapple, (k) ripe
pineapple, and (l) rotten pineapple.
Ref. code: 25605722300067ONR
19
(a)
(b)
Figure 3.3 The E-Nose sensor array. (a) Sensors mounted on a breadboard, (b) PCB
design. 5 Sensors are of TGS 26XX series and 3 sensors are of TGS 8XX series.
Cermet type variable resistors are used. [83]
Ref. code: 25605722300067ONR
20
Table 3.1. Figaro gas sensors used in the E-Nose design and the gases or VOCs to which the sensors are sensitive.
[83]
Sensor
model
Sensor
Index
Gases and VOCs
CH2 C2H2 C3H8 C4H10 H2 H2S CO C6H6 NH3 (CH3)2CO C6H14
Trimethyl
amine and
Methyl
mercaptan
TGS 2612 S1
TGS 821 S2
TGS 822 S3
TGS 813 S4
TGS 2602 S5
TGS 2603 S6
TGS 2620 S7
TGS 2610 S8
Ref. code: 25605722300067ONR
21
3.4 Experimental setup
The block diagram of the experimental setup is shown in Figure 3.4. The
experimental setup comprises a sample chamber, measuring chamber, and a data
acquisition system. Concentration of odour VOCs and measurement of corresponding
electronic signal at the sensor load resistors are the two different phases of the
experiment. The sample chamber and the measurement chambers are connected by
two 1 inch transparent plastic tubes via control valves 3 and 4. Fans 3 and 4 are used
to circulate the odour headspace between the sample chamber and the measurement
chamber during measurement phase. To prevent measurement contamination both the
chambers are air tightened during experiment. After each experiment, valves 1 and 2,
and dc fans 1 and 2 are used to circulate the free air through the measuring chamber to
achieve sensor base level response. The sensors, fans, and acquisition device are
powered by dc voltage source. The overall system is in a temperature-controlled
laboratory at 28oC.
Figure 3.4 The E-Nose experimental setup. Valves 1 and 2, and fans 1 and 2 are for
air flow control, valves 3 and 4, and fans 3 and 4 are for circulating the odor between
sample chamber and measurement chamber. A dc power supply powers the sensors,
fans, and the myRIO. [83]
3.5 Data acquisition system
Data acquisition is performed by myRIO data acquisition device which is
operated wirelessly by a LabVlEW installed computer. The labVIEW schematic
Ref. code: 25605722300067ONR
22
diagram for data acquisition is shown in Figure 3.5. The myRIO is powered with a 10
volts dc power supply. At each ripeness state for each fruit type the experiments are
repeated 20 times to produce 20 data samples for each class as shown in Table 3.2.
Two digit class codes are assigned to each class i.e. each fruit at each ripeness state.
The left digit in each of the two digit code defines the fruit type, and the right digit
indicates ripeness levels. For example, 1 stands for banana and 3 stands for rotten, in
the class code 13. To experiment with a fruit the sensors are preheated for five
minutes to achieve proper sensing behaviour. The odour headspace of the sample
chamber containing the fruit sample under test is sampled in sequence as follows: i) a
sample measurement typically takes two minutes to complete, ii) to remove any
residual odour and return the sensors to their base level free air is pumped into the
sensor chamber using control valves 1 and 2 on the right side of the measurement
chamber for three minutes. For these three minutes the headspace is accumulated for
the next experiment cycle. The signals corresponding to odour concentration has a
transient shape at the beginning. It is observed that after 30 to 60 seconds (different
for each fruit at different ripeness state), it starts to be steady. The steepness of
transient and value of steady state responses are different for different fruit and their
ripeness states.
Table 3.2. Experimental data collection plan for fruit odor classification.
Fruit Class label Ripeness state Number of
experiments
Banana
11 Unripe 20
12 Ripe 20
13 Rotten 20
Mango
21 Unripe 20
22 Ripe 20
23 Rotten 20
Sapodilla
31 Unripe 20
32 Ripe 20
33 Rotten 20
Pineapple
41 Unripe 20
42 Ripe 20
43 Rotten 20
Ref. code: 25605722300067ONR
23
Figure 3.5 LabVIEW schematic diagram for data acquisition.
Ref. code: 25605722300067ONR
24
3.6 Pattern Recognition and Classification Algorithms
The SVM, PCA, k-NN, GRNN, MLPNN, RBFNN, LDA, and the
proposed MMM classification methods are presented in this section. X is the data
matrix and t is the corresponding target vector. The elements of t are class labels as
given in Table 3.2. X and t are expressed as in Eq. 3.1 and Eq. 3.2,
M
m
X
X
2X
1X
X
and
M
m
t
t
2t
1t
t
, (3.1)
where
L,m,NL,m,nL,m,
l,m,Nl,m,nl,m,
,m,N,m,n,m,
,m,N,m,n,m,
m
XXX
XXX
XXX
XXX
X
1
1
2212
1111
and
L,m
l,m
,m
,m
m
t
t
t
t
t
2
1
, (3.2)
l is experimental indices of class m, where, l=1,2,…,L; m=,1,2,…,M; M is the number
of classes, L is the number of data samples in each class, n=1,2,…,N is sensor index
and N is the number of sensors. The rows in X are experimental samples and the
columns are feature or sensor variables. The training, validation and testing data are
chosen from X as per the ratio defined by the designer. From each class 70% i.e. 0.7L
samples are taken for training the classification algorithms in this research.
Experimental samples are recorded into the rows of X and columns of X represent
sensor variables or features. The data ratio to be chosen from the matrix X for testing,
validation, and training are defined by the designer. For this research, to train the
classification methods by 70% i.e. 0.7L samples are chosen.
3.6.1 Principal Component Analysis (PCA)
Preservation of data variance while reducing dimensionality is provided
by the unsupervised classification method known as PCA [41, 62-66]. Orthogonal
Ref. code: 25605722300067ONR
25
transformation process is applied to convert the data to a new feature space.
Maximum data variance in the original space is represented by PC1, the second
largest data variance is orthogonal to PC1 and is the second PC2, the third largest data
variance corresponds to PC3 which is orthogonal to both PC 1 and PC 2. This process
is followed by the higher dimensional PCs. The PCA algorithm is given as follows:
Step 1. Evaluate the covariance matrix of X.
Step 2. Calculate the eigenvectors and eigenvalues of the covariance matrix.
Step 3. Eigenvectors are to be sorted in descending order of eigenvalues.
Step 4. PC1 is the eigenvector with maximum eigenvalue, and so on for the other
PCs according to descending eigenvalues.
3.6.2 k-Nearest Neighbor (k-NN)
The k-NN [19, 65-69] is the simplest machine learning techniques. During
training phase feature vectors and class labels corresponding to training data are
loaded into memory from data matrix X and the target vector t, respectively as in Eq.
3.1. The Euclidean distance metric is applied to find the nearest k training samples
from the test data vector. A test data is assigned to the winning class by a majority
voting performed between the closest k neighbors. k is chosen not to be a multiple of
the number of classes but an odd number to avoid any possible tie. The order of
computational complexity of k-NN is given by O(LM(N+k)).
3.6.3 Support Vector Machine (SVM)
For SVM [65-72] classification hyperplanes are found such that they do
not include interior data and provide maximum margin between pairs of classes.
Support vectors are the data points which lie on the hyperplanes. Maximum-margin
hyperplanes lie between two hyperplanes to maximize the distances to the support
vectors of related classes. The SVM algorithm is given as follows:
Step 1. The Lagrangian dual in Eq. 3.3 is maximized such that 0,, m l
mltml
and 0, ml .
Ref. code: 25605722300067ONR
26
jgml
jgmljgtmltjgmlml
mlmlG,,,
,,,,,,2
1
,,, xx , (3.3)
where g and l are experiment indices within a class, j and m are class indices, l,m
represent Lagrange multipliers, tl,m represent class labels (+1 or -1), and x are data
vectors.
Step 2. Calculate the values of l,m from Eq. 3.3.
Step 3. Find the weight vector w by using l,m and Eq. 3.4.
ml
mlmltml,
,,, xw , (3.4)
Step 4. Bias b is calculated by Karush–Kuhn–Tucker condition [84] shown in Eq.
3.5 using the weights found in step 3.
01,,,
b
mlmlml xwt , (3.5)
Step 5. Test data are classified as per to the sign of the solution of Eq. 3.6,
btbfS
ssss
1
uxuwu , (3.6)
where, s, S, b, and u are support vector index, number of support vectors, bias, and
data vector under test, respectively.
The basic SVM is a binary classifier. A multiclass SVM classifier is a
combination of multiple binary SVM classifiers. For classification by a multiclass
SVM a majority voting between the binary SVM classifiers are performed. Based on
small margin or large margin violations SVM complexity varies between O(NL2M
2)
and O(NL3M
3).
Ref. code: 25605722300067ONR
27
3.6.4 Multilayer Perceptron Neural Network (MLPNN)
The MLPNN [63, 64-66, 75-79] is a supervised feed-forward error back-
propagation neural network. Figure 3.6 shows a three layer MLPNN. Layers of the
MLPNN are fully connected to the next layer as a directed graph. After every epoch
the mean squared errors (MSEs) are calculated between outputs and targets. Synaptic
weights and activation thresholds are adjusted to reduce the MSEs for next cycle. The
input, hidden and output layers have N, J, and D neurons, respectively. N corresponds
to eight sensors. The hidden layer neurons use sigmoid activation function. The output
layer use linear activation function. The MLPNN training algorithm is given below.
Step 1. Weights are initialized to small negative and positive random values.
Step 2. Apply training data to the network to get the network outputs and
calculate errors as shown in step 3 to step 6.
Step 3. Compute the backpropagation error terms for the links from hidden
neurons to output neuron as in Eq. 3.7 below,
Figure 3.6 Block diagram of a multilayer perceptron neural network.
mltmlymlymlyml ,
3
,
3
,13
,
2
, . (3.7)
Ref. code: 25605722300067ONR
28
Step 4. By Eq. 3.8 below back-propagation error term is calculated for each
hidden node,
D
djdwmljyjyj
1
22
,
21
21 . (3.8)
Step 5. The synaptic weights from a node in layer 1 to a node (neuron) in layer 2
are updated as, w1
ij = –1jxl,m,n, and apply, w
1ij = w
1ij + w
1ij. The synaptic
weights from a node at layer 2 to a node in layer 3 are update as, w2
jd = –2dyj,
and apply, w2
jd = w2jd + w
2jd.
Step 6. Mean square error (MSE) is calculated by Eq. 3.9 as follows:
LM
lmlmtlmy
LMMSE
15.0,
1,1
2
,
3
,15.0
1
. (3.9)
As 15% data from each class from matrix X are utilized for validation, L in above
equation is multiplied by 0.15.
Step 7. Until the number of epoch limit or the error limit reached the process is
repeated from Step 2.
Network classification performance is tested by 15% remaining data after
training and validation.
Assumptions:
w1
ij are weights from node i of layer 1 (input layer) to node j of layer 2 (hidden
layer),
w2
jd are weights from node j of layer 2 to node d of layer 3 (output layer),
hidden layer neurons‟ activation functions are tan sigmoid transfer functions,
y2
j is the output from node j of layer 2,
y3
l,m is the output from the output layer,
tl,m is the corresponding target,
is the learning rate,
Biases to any neuron in layer 2 are considered to be 1 (not shown in the algorithm
and in Figure 3.6 for simplicity.) The order of computational cost of MLPNN is
O(I2
MLPNN) where IMLPNN is the total number of neurons in the MLPNN.
Ref. code: 25605722300067ONR
29
3.6.5 Generalized Regression Neural Network (GRNN)
A GRNN [13, 65] has three main layers namely, input, hidden, and output
layers as shown in Figure 3.7. The synaptic weights w1ij are initialized to the training
data vectors and the synaptic weights w2
jd are initialized to the targets corresponding
to the training data at the beginning of the training phase. Hidden layer receives inputs
from the input layer and calculates the Euclidean distances (weights) from the training
data to the test data and produce output using Gaussian activation function (hj). At the
Figure 3.7 Generalized regression neural network block diagram. d2
l,m is Euclidean
distance, is the spreading factor, m is class index, and l is data index within class m.
output layer the class label of the test data is obtained by predefined decision
mapping. The GRNN training algorithm is given below.
Step 1. Choose 70% training samples, 15% validation samples and 15% test
samples from matrix X.
Step 2. Initialize the synaptic weights w1ij to the training samples and synaptic
weights w2jd with the corresponding training targets in vector t.
Ref. code: 25605722300067ONR
30
Step 3. Validation inputs xv,m are applied to find validation outputs by Eq. 3.10
as,
ml
mld
ml
mld
ml
mvy
, 22
2
,exp
, 22
2
,exp,
,
t
x , (3.10)
Where is spreading factor, v=1,2,…,0.15L, is validation data index in class m,
and d2
l,m=(xv,m–xl,m)T(xv,m–xl,m), is the square Euclidean distance from a training
sample xi,m to a validation sample xv,m, ((.)T indicates transpose).
Step 4. Mean square error, E is calculated as
mv
mvtmvyE,
2
,,x . (3.11)
Step 5. , Continue to Step 3 and adjust if E > Ethreshold, else stop.
Considering that IGRNN is the total number of neurons in the GRNN the
computational cost of GRNN is O(IGRNN).
3.6.6 Radial Basis Function Neural Network (RBFNN)
The RBFNN [65, 66, 73-77] can be designed in a similar way to the
GRNN shown in Figure 3.7 with exception that the synaptic weights, w2jd, are
initialized to small random number instead the true targets unlike GRNN during the
training phase. The RBFNN algorithm is presented below.
Step 1. is chosen initially randomly.
Step 2. Choose random weights for w2
jd.
Step 3. Calculate the entries of the matrix given in Eq. 3.12.
Ref. code: 25605722300067ONR
31
22
2
exp22
2
exp
22
2
exp22
2
exp
11
111111
L,ML,M,L,M
L,M,,,
XXXX
XXXX
Φ
(3.12)
Step 4. Weight vector, w2 =
–1t is calculated.
Step 5. Output, y= w2 is calculated.
Step 6. Mean squared error, E, is evaluated by, mv
mvmvyE,
2
,, tx .
Step 7. Continue to step 4 and change if E > Ethreshold, else exit.
If optimal weights are found in one epoch the computational complexity
of RBFNN is O(IRBFNN) else the complexity can be up to O(I2
RBFNN). IRBFNN is the
total number of neurons in the RBFNN and is equal to IGRNN.
3.6.7 Linear Discriminant Analysis (LDA)
LDA [34, 35, 37, 41, 46, 47, 65, 66] is a generalization of Fisher's linear
discriminant. It finds a linear combination of features which is used as a linear
classifier or for dimensionality reduction before later classification. LDA assumes that
the independent variables or features are normally distributed. Unlike PCA, LDA is a
supervised classification method. LDA explicitly attempts to model the difference
between the classes of data.
A set of training data samples (70%) are randomly chosen from matrix X
and then a good predictor is found given a new observation x. The idea of LDA is to
find a projection where class separation is maximized. Given two sets of labeled data,
1X and
2X , with class mean vectors
1Xμ and 2Xμ defined as,
Ref. code: 25605722300067ONR
32
L
l mlmXL
7.01 ,
7.0
1Xμ
, (3.13)
where 0.7L is the number of training examples of class mX . The goal of linear
discriminant analysis is to maximize between to within class variance i.e. to maximize
ww
www
within
between
ST
ST
J )(,
(3.14)
where, betweenS is between-class covariance matrix and
withinS is the average of
within-class covariance matrices of the corresponding classes (as LDA considers all
classes within variances to be equal) and defined as
M
m
L
l mml
mmlML
within
T
between
T
1
7.0
1,,7.0
1
1212
XX
XXXX
μXμXS
μμμμS
, (3.15)
where, M=2 for LDA between two classes. For multiclass LDA, LDA of all pairs of
classes are formed and their classification decisions are combined to assign a test data
to a class. Differentiating (w)J with respect to w , setting equal to zero, and
simplifying gives
12
1
XXμμSw
-
within . (3.16)
A new point is classified by projecting it onto the maximally separating direction and
classifying it as in class 1
X among 1
X and 2X if:
Ref. code: 25605722300067ONR
33
2
112, log
X
XXX
2
μμ
Xwp
pml
T
. (3.17)
In the case where there are more than two classes, the classes are
partitioned, and LDAs are used to classify the partitions. One way of partitioning is
done as “one against the rest” where the dataset from one class are put in one group
and everything else in the other. This will result in M classifiers, whose results are
combined. Another common method is pairwise classification, where a new classifier
is created for each pair of classes (giving 2/1MM classifiers in total), with the
individual classifiers combined to produce a final classification. In this thesis LDA is
applied to train with two training classes for false classification and correct
classification analysis in subsection 4.3.1.
3.7 Sensor panel minimization techniques
The number of sensors in an E-Nose sensor panel is needed to be
minimized to reduce E-Nose design cost and complexity. This purpose can be
satisfied by finding and excluding less responsive sensors to the target odour from the
panel without affecting the E-Nose classification performance. In this section the
novel sensor array minimization techniques developed in this research for fruit odour
detection are discussed.
3.7.1 Exhaustive search method
All possible combinations of sensors are found first. The number of
combinations is, 12 N
, where N is the total number of sensors [83]. For this
research the classification algorithms (multiple SVM, and k-NN) are then trained with
70% randomly chosen samples from all the classes. Remaining 30% of the samples
are then used to verify the pattern recognition capability of each combination of
sensors. The sensor combinations, with minimum number of sensors and acceptable
pattern recognition error level, are the optimal sensor sets.
Ref. code: 25605722300067ONR
34
3.7.2 PCA loading and mutual information based approach [83]
PCA analysis of the training data over all the classes is performed. The
PC loading information is recorded in matrix A as in Eq. (3.18).
NNaNa
jia
Naa
,,1
,
1,1,1
A , (3.18)
where, ai,j indicates loading of i th sensor variable on PC j, and N is the number of
sensors. The columns of the matrix A define the PCs.
As the experimental data within each class are randomly distributed with
respective means and variances, the sensor data related to a class is considered to be
Gaussian distributed in this thesis. Entropy of a univariate Gaussian random variable
is given as in Eq. (3.19).
22ln
2
1)( zeZH , [85] (3.19)
where, e is a constant, 2z is the variance of random variable Z. Thus mutual
information between two random variables, Y and Z is given by,
2,1ln
2
1, ZYcorrZYI , [85] (3.20)
where, corr indicates correlation. From Eq. (3.20) when, 0, ZYcorr , the
mutual information will be large. This is possible when loadings of the sensors under
consideration, on any PC, are close in magnitude and have same sign, i.e. coincide
with same side (positive or negative axis) of a particular PC. Figure 3.8 shows a
loading plot of the experimental data on first two PCs. For example, in Figure 3.8, as
TGS822 and TGS813 are inclined to same (positive) side of PC 1, they are expected
to have large mutual information with TGS822 larger loading on PC 1. Thus TGS822
should be chosen among the two. On the other hand, TGS822 and TGS2602 should
have small mutual information. More detail is presented in results and discussion
section. On inclusion of every new sensor in the optimal sensor group, the group‟s
error performance is verified. If the performance is not satisfactory, a new sensor is
Ref. code: 25605722300067ONR
35
chosen following similar process as per descending values of mutual information
from higher dimensional PCs. The mutual information between each pair of sensors
for the training samples over all classes is calculated as per Eq. (3.20) and stored in
the B matrix as in Eq. (3.21).
NNsNs
jis
Nss
,,1
,
1,1,1
B , (3.21)
where, jis , is the mutual information between sensors i and j , B is a NN
dimensional symmetric matrix, and N is the number of sensors. The diagonal elements
in matrix B for which, i equal to j are self-information and are very large. These
elements are not required and are omitted for simplicity of the algorithm to find the
elements with maximum mutual information and descendants.
Figure 3.8 Two dimensional PC Loading of the Fruits‟ Odor Data to Explain PC
Loading and Mutual Information Based Approach. [83]
Ref. code: 25605722300067ONR
36
3.7.3 Threshold based approach
For any Gaussian variable 99.7% of the data remain within three standard
deviations on both side of the mean. Gaussian curve of the TGS822 sensor response
for green mango, ripe banana, and ripe sapodilla are plotted with respective standard
deviation and mean. As shown in Figure 3.9, if the ratio of the distance between the
means of two Gaussian random variables to the summation of three times of the
standard deviations of the corresponding Gaussian random variables is approximately
equal then the bell shapes of the Gaussian functions will be sufficiently apart or
negligibly overlap. In Figure 3.9, the Gaussian curves for green mango and ripe
banana do not overlap, while Gaussian curves for ripe banana and ripe sapodilla
overlaps significantly as their standard deviation is high compared to mean distances.
The sensors for which this overlap occurs are not good choice to classify
corresponding classes. Let this ratio be defined as,
thnjni
njninji
,,
,,,, , (3.22)
where, ni, and nj , are the means of classes i and j respectively, ni, and
nj , are standard deviations of classes i and j respectively, n is the corresponding
sensor, and th is the threshold. Larger values of nji ,, indicate that means of two
classes are sufficiently far and/or the variances are small enough and the possibility of
overlapping of the corresponding Gaussian functions are less. For each sensor
according to Eq. (3.22) the ratio, nji ,, , of all pairs of classes are calculated. The
class pairs and sensors for which nji ,, are greater than or equal to the threshold (
th ) are recorded in a NPN dimensional matrix, where, PN is the number of
pairs of classes and N is the number of sensors. The sensor that classifies highest
number of pairs is chosen first, the detection error to classify all classes are calculated
by classification algorithms. If the detection error is not acceptable, another sensor is
added such that the first and second sensor acting together classifies more number of
Ref. code: 25605722300067ONR
37
class pairs. If the error performance is not acceptable, one more sensor is added such
that maximum number of class pairs is classified. This process should be continued
until the error rate is acceptable. The sensor combination that classifies the classes
with minimum number of classification errors is the optimal sensor set.
Figure 3.9 Gaussian functions with means, µ=0.61, µ=1.30, µ=1.90 and standard
deviations =0.04, =0.22, =0.34 for green mango, ripe banana, and ripe sapodilla,
respectively. [83]
3.8 Hyperplane versus Hyperspheric Classification
Sensors convert odour data to electronic signal. These signals are then pre-
processed to make it suitable to be applied to pattern recognition algorithms for
training. Meaningful extraction of information from sensed data with good correct
classification and less false alarm necessitates a scrupulous pattern recognition method.
Hyperspheric classification boundary based classification methods are potential
candidates for reduction of false classification rate and thereby improve “correct
rejection” rate in E-Nose applications.
Ref. code: 25605722300067ONR
38
Figure 3.10 A PCA of the dataset to explain false classification problems of k-NN,
SVM, LDA, and MLPNN. The dataset is a PCA scores plots of the four fruit types
each at three ripeness states. In the figure GB is unripe banana, RB is ripe banana,
RtB rotten banana, GM is unripe mango, RM is ripe mango, RtM is rotten mango, GS
is unripe sapodilla, RS is ripe sapodilla, RtS is rotten sapodilla, GP is unripe
pineapple, RP is ripe pineapple, and RtP is rotten pineapple.
The data point “R” shown in Figure 3.10 does not belong to any class as it
is reasonably far from all the classes although it is situated in the middle of the dataset
and an E-Nose should not classify it to any odour class which might cause false
classification. However it will be classified to the nearest class by k-NN. With SVM
and LDA classification methods it will fall on the side of a class and thereby will be
false classified to the class. An MLPNN classification boundary although encloses the
inner classes by the outer classes remain open as a Voroni diagram. In addition the
enclosed inner classes contain empty spaces (as seen in Figure 3.10) within the classes
which should not be. Thus performing pattern recognition by any method which
produces hyperplanes to classify will classify the unwanted data to a class among any
two classes and cause false classification. To overcome this false classification issue
Ref. code: 25605722300067ONR
39
algorithms which classify data by hyperspheres shall be considered. In Figure 3.11
enclosed ellipsoid boundaries are shown around the classes. This can be achieved with
RBFNN and GRNN with Gaussian activation functions in the hidden layer. In this
process the any data and empty spaces around the relevant classes are excluded and
false classification does not occur.
As researches show GRNN to be better functioning compared to RBFNN,
GRNN should be chosen to overcome false classification for E-Nose applications. As
the number of neurons required by GRNN is high, an approximate GRNN [86] is also
presented in this thesis. The number of hidden layer neurons needed by approximate
GRNN is equal to the number of classes. In the next subsection a novel hyperspheric
classification method based on minimum, maximum, and mean of each features for all
the classes is shown which is also capable to reduce false classification error.
Figure 3.11 A PCA plot with closed boundary to explain the effect of hyperspheric
closed boundary to avoid false classification. In the figure GB is unripe banana, RB is
ripe banana, RtB rotten banana, GM is unripe mango, RM is ripe mango, RtM is
rotten mango, GS is unripe sapodilla, RS is ripe sapodilla, RtS is rotten sapodilla, GP
is unripe pineapple, RP is ripe pineapple, and RtP is rotten pineapple.
Ref. code: 25605722300067ONR
40
3.8.1 Minimum-maximum-mean (MMM) hyperspheric classification method
In this thesis a simple classification method named as MMM [87] method
is proposed. The method is based on maximum, minimum, and mean responses of the
sensors for different classes. The minimum vectors (nCv ), mean vectors (
nCu ), and
maximum vectors (nCq ) are calculated during training for each class from the
training data samples and are stored in matrices Q, V, and U, respectively as
NMNMM
N
MM
m
,1,
,11,1
1
1
q
q
q
Q
, (3.23)
NMNMM
N
MM
m
,1,
,11,1
1
1
vv
vv
v
v
v
V
, and (3.24)
NMNMM
N
MM
m
,1,
,11,1
1
1
uu
uu
u
u
u
U
. (3.25)
In Eqs. (3.23)-(3.25) N and M are the number of sensors and number of
samples, respectively. The rows of the matrices of Q, V, and U, are minimum,
Ref. code: 25605722300067ONR
41
maximum, and mean vectors, respectively. While the columns of Q, V, and U
represent sensor variables. Features of a test data vector is compared element by
element to the minimum and maximum vectors of each class. A test data is classified
to a class for which the following two criteria satisfy: (i) every feature variable of the
test data vector is less than or equal to every element of the maximum vector of the
corresponding class and (ii) every feature of the test data vector is greater than or
equal to every element of the minimum vector of the corresponding class. The criteria
(i) and (ii) ensure the minimum-maximum (min-max) range assessment for each
sensor i.e. feature variable of the test data. As shown in Figure 3.11 test data might be
assigned to multiple classes due to minor overlapping between classes. If any odor
data is classified to multiple classes the Euclidean distances betwixt test data and the
class mean vectors of the corresponding tied classes in matrix U are calculated. The
test data is assigned to the class which mean vector is the closest to the test data
vector. The algorithm is given as follows:
Step 1. Compute the maximum, minimum, and mean matrices Q, V, and U of the
training dataset.
Step 2. Compare each feature of a test data vector to the corresponding feature in
the minimum and maximum vectors of each class.
Step 3. Assign a test data to a class if every feature of the test data is within the
min-max limits of a class. If the test data does not fall within the min-max range of
any of the classes, then label it as “unclassified” or “correctly rejected,” and stop.
Step 4. If any test data are within min-max limits of multiple classes, then the
case of a tie occurs. To break this tie the test data are assigned to the class whose
mean vector (measured by Euclidean distance metric) is the closest to the test data
vector. Once the test data are assigned to a class, exit the program.
Step 5. Run the steps from 1 to 4 and calculate the percentage of error by
215.0,15.01,1 ,,
MLml mlmlMMME ty . If the error
thresholdMMM EE , exit, else add
m η σ to corresponding rows of Q to expand the hyperspheric classification
boundary and continue to Step 1. Where, η is learning rate, mσ is standard deviation
vector of class m.
Ref. code: 25605722300067ONR
42
All sensors in the sensor array are required to function properly. A
malfunctioning sensor should be detected and replaced, and the E-Nose should be
trained again to achieve good classification performance.
Each of the mean, minimum, and maximum matrices has computational
cost of O(M). If tie occurs the complexity becomes O(M+Ities(N+k)), where Ities is the
number of tied classes. The terms Ities(N+k) above indicates complexity to find the
nearest class from the means of the tied classes and k is considered 1 for tie breaking.
Due to E-Nose data pattern ties are less likely to occur and thereby the computational
complexity remains low.
Simulation and analytical results of the sensor panel minimization
methods, and hyperspheric classification methods presented in this chapter are shown
in the next results and discussions chapter.
Ref. code: 25605722300067ONR
43
Chapter 4
Results and Discussions
4.1 Data Preprocessing
Time versus voltage responses of the sensors of one measurement to the
VOCs from four fruit types at each ripeness state namely green mango, ripe mango,
rotten mango, green banana, ripe banana, rotten banana, green sapodilla, ripe
sapodilla, rotten sapodilla, green pineapple, ripe pineapple, and rotten pineapple are
shown in Figure 4.1. The response curves have two segments, namely, transient and
steady state. Average value of the steady state segment for every sensor is calculated.
The combination of these average values forms a signature pattern for an experiment.
Training data is a collection of signatures from all the experiments. The experiments
for each type of the fruits at three ripeness states are repeated 20 times forming total
240 samples of experimental data for twelve classes.
Figure 4.2 shows a three dimensional PCA scores plot of the training data
of four types of fruits (pineapple, banana, sapodilla, and mango) at three different
ripeness states. The labels G, R, Rt indicate unripe, ripe, and rotten states of fruits,
respectively, while B, M, S, and P represent pineapple, mango, banana, and sapodilla,
respectively. For example, RtB stands for rotten banana.
It is seen from Figure 4.2 that the scores of ripe banana and ripe pineapple
do not overlap with any other classes and are fully classifiable, while the scores of the
other fruits at different ripeness states have minor overlap. This overlapping indicates
that the minor classification likely to occur to classify corresponding classes.
Ref. code: 25605722300067ONR
44
Figure 4.1 Sensor responses from eight sensors to four kinds of fruits at three
ripeness states (continued to next page). [83]
Ref. code: 25605722300067ONR
45
Figure 4.1 Sensor responses from eight sensors to four kinds of fruits at three
ripeness states (continued from previous page).
Figure 4.2 PCA scores plot of training data of four fruit types at three ripeness states.
Ref. code: 25605722300067ONR
46
4.2 Sensor panel minimization methods
Performance of two approaches of sensor panel minimization techniques
such as between to within variance based method [56-57] and exhaustive search
method are presented first. Later performance of two proposed novel methods of
sensor panel minimization are also shown.
4.2.1 Between to within variance based method
In Figure 4.3 the between to within [56-57] class variances for each
sensor are shown. To find the optimal set of sensors, the sensors are sorted as, S3, S4,
S5, S6, S7, S8, S1, and S2, as per their descending values of between to within class
variances. As S3 has the highest ratio of between to within variance, it is picked first.
Next, two sensors S3 and S4 are picked, and then three sensors S3, S4, and S5 are
picked, and so on. The percentage of pattern recognition error caused by, MLPNN,
RBFNN, k-NN, and SVM classification algorithms for different combination of
sensors is listed in Table 4.1. The MLPNN and RBFNN show very high classification
errors compared to k-NN and SVM. We see that the pattern recognition errors with k-
NN and SVM algorithms are less than 10% for three or more sensor combinations.
Thus the three sensor combination S3, S4, and S5 is considered as the optimal set of
sensors with k-NN, or SVM chosen as classification algorithm. For improved
performance sensor combinations with more number of sensors should be preferred,
as shown in Table 4.1.
Figure 4.3 Between to within variance of all the classes for each sensor. [83]
Ref. code: 25605722300067ONR
47
Table 4.1 Pattern recognition error rates of test data for between to within variance
method. [83]
4.2.2 Exhaustive search method
Pattern recognition capability of all the (28 – 1 = 255) combinations of
sensors are evaluated by MLPNN, RBFNN, k-NN, and SVM. The sensor
combinations with minimum pattern recognition errors for different number of
sensors are recorded in Table 4.2 and Table 4.3. It is observed from Table 4.2 and
Table 4.3 that MLPNN and RBFNN show higher error compared to k-NN and SVM.
The sensor combinations with three or more sensors show less than 10% errors for k-
NN and SVM. For the three sensor combinations k-NN and SVM show 9.72% and
2.78% classification errors, respectively. It is observed that the three sensor
combination S3, S5, and S8 is common in both the cases. For more than three sensor
cases the pattern recognition error decrease more for SVM, compared to k-NN.
In the next subsections the results of the proposed methods to reduce the
number of sensors for an E-Nose are presented.
Sensor combinations with minimum
error
Percentage error
MLPNN RBFNN k-NN SVM
(S3) 88.89 87.50 20.83 37.50
(S3, S4) 94.44 91.67 11.11 11.11
(S3, S4, S5) 80.56 69.44 6.94 4.17
(S3, S4, S5, S6) 72.22 80.56 6.94 2.78
(S3, S4, S5, S6, S7) 72.22 54.17 5.56 2.78
(S3, S4, S5, S6, S7, S8) 55.56 63.89 4.17 2.78
(S3, S4, S5, S6, S7, S8, S1) 55.56 33.33 0.00 1.39
(S3, S4, S5, S6, S7, S8, S1, S2) 19.44 27.78 0.00 0.00
Ref. code: 25605722300067ONR
48
Table 4.2 Classification error by MLPNN and RBFNN methods. For different number of sensor cases, the combinations
with minimum pattern recognition errors are recorded. [83]
MLPNN RBFNN
Sensor combination
Pattern
recognition
error (%)
Sensor combination
Pattern
recognition
error (%)
Single sensor case (S7), (S8) 80.56 Single sensor case (S8) 79.17
Two sensor cases (S2, S8), (S5, S8), (S6, S8) 66.67 Two sensor cases (S4, S5), (S6, S8) 60.06
Three sensor cases (S3, S6, S8), (S5, S6, S8) 44.44 Three sensor cases (S2, S4, S5) 62.50
Four sensor cases (S1, S2, S5, S8), (S2, S5, S6,
S8), (S3, S5, S6, S8) 52.78 Four sensor cases (S1, S2, S5, S7), (S1, S2, S5, S8) 41.67
Five sensor cases (S1, S2, S4, S5, S7), (S1, S2,
S4, S5, S8), (S2, S3, S5, S7, S8) 38.89 Five sensor cases
(S1, S2, S3, S4, S5), (S1, S2, S4,
S5, S8) 33.33
Six sensor cases (S1, S2, S3, S5, S6, S7) 38.89 Six sensor cases
(S1, S2, S3, S5, S6, S8), (S1, S3,
S4, S5, S7, S8), (S2, S3, S5, S6,
S7, S8)
31.94
Seven sensor cases (S1, S2, S3, S4, S6, S7, S8) 27.78 Seven sensor cases (S1, S2, S3, S4, S5, S6, S7) , (S2,
S3, S4, S5, S6, S7, S8) 31.94
Eight sensor cases (S1, S2, S3, S4, S5, S6, S7, S8) 19.44 Eight sensor cases (S1, S2, S3, S4, S5, S6, S7, S8) 27.78
Ref. code: 25605722300067ONR
49
Table 4.3 Classification error by k-NN and SVM methods. For different number of sensor cases, the combinations with
minimum pattern recognition errors are recorded. [83]
k-NN SVM
Sensor combination
Pattern
recognition
error (%)
Sensor combination
Pattern
recognition
error (%)
Single sensor case (S8) 27.78 Single sensor case (S5) 30.56
Two sensor cases (S3, S8), (S4, S6), (S5, S7),
(S6, S7), (S6, S8) 15.28 Two sensor cases (S5, S8) 4.17
Three sensor cases (S3, S5, S8), (S4, S5, S7), (S4,
S5, S8) 9.72 Three sensor cases
(S3, S5, S7), (S3, S5, S8), (S5, S6,
S7) 2.78
Four sensor cases
(S3, S4, S5, S6), (S3, S4, S5,
S8), (S4, S5, S6, S7), (S4, S5,
S7, S8)
8.33 Four sensor cases (S3, S5, S6, S7), (S3, S5, S6, S8) 1.39
Five sensor cases
(S3, S4, S5, S6, S7), (S3, S4,
S5, S6, S8), (S3, S4, S5, S7,
S8)
8.33 Five sensor cases (S3, S4, S5, S6, S7), (S3, S4, S5,
S7, S8), (S3, S5, S6, S7, S8) 2.78
Six sensor cases S3, S4, S5, S6, S7, S8 9.72 Six sensor cases S3, S4, S5, S6, S7, S8 2.78
Seven sensor cases (S1, S2, S3, S4, S5, S6, S8),
(S1, S2, S3, S4, S5, S7, S8) 8.33 Seven sensor cases (S2, S3, S4, S5, S6, S7, S8) 0.00
Eight sensor cases (S1, S2, S3, S4, S5, S6, S7, S8) 5.56 Eight sensor cases (S1, S2, S3, S4, S5, S6, S7, S8) 0.00
Ref. code: 25605722300067ONR
50
4.2.3 PCA loading and mutual information based approach
Table 4.4 and Table 4.5 show the PC loadings and mutual information
between each pair of sensor data, respectively. The diagonal elements of Table 4.5 are
self-information hence large and are ignored for simplicity of the algorithm to find the
sensor pairs with high mutual information. The sensor pair S3 and S7 has the largest
mutual information (Table 4.5) and S3 has higher loading on PC1 (Table 4.4). Thus,
sensor S3 is chosen. The sensor pair S7 and S8 has the second largest mutual
information (Table 4.5) and S8 has higher loading on the negative PC2 axis (Table
4.4). Thus, S8 is chosen from the pair S7 and S8. The sensor pair S5 and S6 has the
third largest mutual information (Table 4.5) and S5 has higher loading on the positive
PC2 axis (Table 4.4). Thus, S5 is chosen from the pair S5 and S6. In this way, the
minimal set of sensors is S3, S5, and S8, with 9.72% pattern recognition error for k-
NN, and 2.78% for SVM (Table 4.3). For the sensor combination S3, S5, an S8 the
pattern recognition errors with MLPNN and RBFNN are 77.78% and 87.50%,
respectively (Table 4.2). High error with MLPNN and RBFNN indicates that they are
not good choice when number of sensors is reduced.
Table 4.4 Principal component loadings. [83]
Sensors
Principal Component (PC)
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
S1 0.0954 –0.0401 0.1541 –0.2580 0.3402 0.8735 0.1418 –0.0113
S2 0.0307 0.1511 0.6316 0.4201 0.5535 –0.1618 –0.2200 0.1411
S3 0.5877 –0.2413 –0.3490 0.6504 –0.0319 0.2050 –0.0893 –0.0231
S4 0.4995 –0.2156 0.6289 –0.1967 –0.5099 –0.0467 0.0684 –0.0543
S5 0.3328 0.6816 –0.1111 –0.2193 –0.0844 0.0747 –0.5773 –0.1353
S6 0.3689 0.4790 –0.0792 –0.0222 0.1852 –0.2032 0.7426 0.0259
S7 0.2974 –0.2336 –0.1785 –0.3790 0.2305 –0.1776 –0.1599 0.7558
S8 0.2444 –0.3497 –0.0976 –0.3210 0.4716 –0.2969 –0.0930 –0.6215
Ref. code: 25605722300067ONR
51
Table 4.5 Mutual Information Between Pairs of Sensors. The self-information in
diagonal cells is omitted. [83]
4.2.4 Threshold based method
For any Gaussian variable, 99.7% data remain within three standard
deviations on both sides of the mean. Thus according to Eq. (3.22), a choice for the
threshold th equal to 3 will not pick any sensor for which class overlapping occurs.
This confirms that the algorithm will pick those sensors which cause less overlapping
and thereby reduce the number of classification errors. Smaller threshold raises
acceptable error limit, and larger threshold decreases the acceptable error limit with 3
as the optimal threshold. Four kinds of fruits at three ripeness states make 12 classes.
The total numbers of class pairs are 66. Each pair is classified sequentially with each
sensor and the corresponding errors are recorded. The smallest group of sensors that
meets the desired error limit is chosen. The three and four sensor combinations, the
number of class pairs they classify, and corresponding classification errors are listed
in Table 4.6 and Table 4.7. It is seen from Table 4.6 that the three sensor combination,
S3, S5, and S8 classifies maximum pairs of classes, compared to the other three
sensor combinations. From Table 4.7, four sensor combinations, (S3, S4, S5, and S8),
(S3, S5, S6, and S8), and (S3, S5, S7, and S8) classify more pairs of classes than the
other four sensor combinations. The classification performance of the three sensor
combinations and four sensor combinations noted above are verified by MLPNN,
RBFNN, k-NN and SVM. For both three and four sensor combinations MLPNN and
RBFNN show large classification errors. It is found that for k-NN and SVM the three
sensor combination S3, S5, and S8 has 9.72% and 2.78% pattern recognition errors,
Sensor
Index S1 S2 S3 S4 S5 S6 S7 S8
S1 - 0.0516 0.7881 0.9682 0.3070 0.4602 0.8097 0.6833
S2 0.0516 - 0.0197 0.0555 0.1382 0.1151 0.0061 0.0005
S3 0.7881 0.0197 - 1.4419 0.4015 0.6539 1.6569 1.0766
S4 0.9682 0.0555 1.4419 - 0.3783 0.6104 1.3474 1.0017
S5 0.3070 0.1382 0.4015 0.3783 - 1.5397 0.2889 0.1517
S6 0.4602 0.1151 0.6539 0.6104 1.5397 - 0.4842 0.2879
S7 0.8097 0.0061 1.6569 1.3474 0.2889 0.4842 - 1.5770
S8 0.6833 0.0005 1.0766 1.0017 0.1517 0.2879 1.5770 -
Ref. code: 25605722300067ONR
52
respectively as found by exhaustive search method, and PCA loading and mutual
information based approach. Among the four sensor cases, (S3, S4, S5, S8) shows
8.33% and 4.17% errors for k-NN and SVM, respectively. The other sensor
combinations show higher number of errors (Table 4.6 and Table 4.7). Thus with the
threshold based method it is found that the three sensor combination (S3, S5, S8) is
the minimal sensor set.
Implementation complexity of both PC loading and mutual information,
and threshold based approaches are similar. With the PC loading and mutual
information based approach, and threshold based approach the classification
algorithm is needed to be simulated few number of times to find the minimal sensor
set. The total complexity of the PCA loading and mutual information based approach
is the sum of the PCA complexity, complexity of finding mutual information, and the
number of trials multiplied by the complexity of the classification algorithm. Whereas
with the threshold based approach few suboptimal combinations of sensors are found
first, and later their pattern recognition performances are analysed with a
classification algorithm. Thus complexities of the proposed methods are less, as the
expensive classification algorithm is needed to be simulated much less number of
times compared to exhaustive search or GA method. Based on the complexity and
pattern recognition errors, the SVM algorithm along with the PCA and mutual
information based method, or the threshold based method can be chosen to design an
E-Nose sensor panel with minimum number of sensors at acceptable error rate. It is
found that an E-Nose can be designed picking only three sensors (S3, S5, S8) and
SVM as the classification algorithm to classify banana, mango, sapodilla, and
pineapple at three ripeness states with 2.78% possible misclassification errors. This
result is also better compared to the between to within variance ratio method [56-57],
where (S3, S4, S5) is the optimal sensor set with 6.94% error with k-NN and 4.17%
error with SVM algorithm.
Ref. code: 25605722300067ONR
53
Table 4.6 Maximum number of pairs of classes classifiable by combinations of three
sensors. [83]
Three sensor
combinations
Number of pairs of
classes classifiable out
of total 66 pairs
Detection errors (%)
MLPNN RBFNN k-NN SVM
S3, S4, S5 52 80.56 69.44 11.11 4.17
S3, S5, S7 52 80.56 91.67 12.50 2.78
S3, S5, S8 54 77.78 87.50 9.72 2.78
S5, S6, S8 53 44.44 98.61 13.89 4.17
S5, S7, S8 53 72.22 94.44 13.89 5.56
Table 4.7 Maximum number of pairs of classes classifiable by four sensors
combinations. [83]
4.3 False classification reduction: Hyperplane versus hyperspheric
In this section classification performance of the present hyperplane and
hyperspheric boundary based classification algorithms to correctly reject irrelevant
data and thereby reduce false classification is analyzed. Later the classification
performance of the MMM method in terms of correct classification and false
classification rate is analyzed.
4.3.1 GRNN compared to SVM, LDA, k-NN, and MLPNN
The mean signature patterns of ripe mango, ripe sapodilla and ripe
pineapple as shown in Figure 4.4 are significantly different from each other.
Responses from the sensors S1 and S2 are small and do not vary significantly for
Four sensor
combinations
Number of pairs of
classes classifiable out
of total 66 pairs
Detection errors (%)
MLPNN RBFNN k-NN SVM
S3, S4, S5, S8 55 72.22 48.61 8.33 4.17
S3, S4, S6, S8 54 63.89 62.50 12.50 6.94
S3, S5, S6, S8 55 52.78 61.11 12.50 1.39
S3, S5, S7, S8 55 72.22 77.78 12.50 5.56
S4, S5, S7, S8 54 63.89 54.17 8.33 6.94
S5, S6, S7, S8 54 63.89 88.89 12.50 2.78
Ref. code: 25605722300067ONR
54
different fruits. The sensors S3 to S8 show good responses for different fruits and
produce distinct signature patterns. The signatures are eight dimensional as eight
sensors are chosen for this experiment. A PCA scores plot in Figure 4.5 shows that
the three classes of fruits are significantly separable in a two dimensional PC space.
Figure 4.4 Signature patterns for three types of fruits. (a) Ripe mango, (b) ripe
sapodilla and (c) ripe pineapple. [86]
Figure 4.5 Scores plot of three types of fruits. [86]
Ref. code: 25605722300067ONR
55
The odor data from ripe sapodilla (class 1) and ripe pineapple (class 2) are
considered as training classes and the data from ripe mango (class 3) are considered as
untrained class. The E-Nose is trained to classify ripe sapodilla and ripe pineapple
into class 1 and class 2, respectively. From each training class 70% data are randomly
chosen for training SVM, LDA and k-NN classification algorithms and remaining
30% are taken for testing. For GRNN and MLPNN 70% odor data are used for
training, 15% for validation and 15% for testing. The data from ripe mango is
considered irrelevant i.e. untrained class for this analysis and are not used to train the
E-Nose classification algorithms. The classification performance of SVM, LDA and
k-NN are tested with the whole dataset from ripe mango and 30% remaining data
from ripe sapodilla and ripe pineapple. For GRNN and MLPNN as 15% of the data
from ripe sapodilla and ripe pineapple are used for validation, remaining 15% from
each of them and whole dataset of ripe mango are used for testing classification
performances. Simulation results (Table 4.8) show that the ripe sapodilla (class 1) and
ripe pineapple (class 2) are rightly classified by each of the algorithms except
MLPNN having 11% misclassification errors. The ripe mango samples (irrelevant
class i.e. class 3) are falsely classified to either ripe sapodilla (class 1) or ripe
pineapple (class 2) by SVM, LDA and k-NN. Whereas MLPNN falsely classifies
1.67% ripe mango samples to class 1 and 8.33% to class 2 and 90% to unknown
classes. The GRNN does not falsely classify the ripe mango samples to any training
class instead correctly rejects them. This is expected as GRNN creates bounded
hyperspheric classification boundaries around the classes to which it is trained. Thus
it is seen that only GRNN does not false classify and correctly rejects data of
irrelevant classes. In other way, it does not result in any “false alarm” or “false
classification”.
Ref. code: 25605722300067ONR
56
Table 4.8 Classification of ripe mango, ripe sapodilla and ripe pineapple samples by
different algorithms. [86]
Exact versus approximate GRNN
For exact and approximate GRNN [86] as shown in Figure 4.6, minimum
classification errors occur at spreading factor 0.06 and 0.15, respectively. The exact
GRNN model shows zero classification error at spreading factor 0.06, while
approximate GRNN shows minimum error of 3.85% at a spreading factor of 0.15.
Reduction of the spreading factor further from 0.06 for exact GRNN and 0.15 for
approximate GRNN over fitting occurs and training class classification performances
of both the GRNN models degrade. It is seen in Figure 4.6 that, classification errors
start to increase at a spreading factor of 0.05 for exact GRNN and at 0.14 for
approximate GRNN. On the other hand, increasing the spreading factors more than
the spreading factors at minimum classification error classification boundary of the
training classes widens for both the GRNN models. Due to this widening, fraction of
the classification boundaries of the training classes overlaps which causes
misclassification of the test data of training classes. Widening classification boundary
also causes data from untrained class to fall into training classes‟ boundary and causes
false classification. As a result total classification error increases. At spreading factor
0.1 for exact GRNN and 0.16 for approximate GRNN classification error start to
increase from the minimum classification error levels. It is also discerned from Figure
4.6 that approximate GRNN can be implemented to reduce implementation
Classification
algorithm
(%) Ripe mango samples (irrelevant class 3
samples) (%) Ripe sapodilla
(class 1) and ripe
pineapple (class 2)
misclassified
false classified as Classified to
unknown
classes
Correct
rejection Class 1 Class 2
SVM 40.00 60.00 0.00 0.00 0.00
LDA 21.25 78.75 0.00 0.00 0.00
k-NN 11.00 89.00 0.00 0.00 0.00
MLPNN 1.67 8.33 90.00 0.00 11.00
GRNN (at
spread = 0.09,
and bias = 6)
0.00 0.00 0.00 100.00 0.00
Ref. code: 25605722300067ONR
57
complexity and cost with a small increase in classification error at a suitable
spreading factor.
Figure 4.6 Classification errors to classify ripe mango samples, and test samples of
ripe sapodilla and ripe pineapple by exact GRNN and approximate GRNN at different
spreading factors. [86]
4.3.2 MMM versus k-NN, SVM, GRNN, RBFNN, and MLPNN
Correct classification rate and false classification rate of the proposed
MMM [87] are compared with that of k-NN, SVM, GRNN, RBFNN, and MLPNN in
this sub-section. 70% of the data samples from sapodilla, pineapple, and banana each
at three ripeness state are used for training and 30% for validation and testing.
The signature patterns of nine trained classes, i.e., green sapodilla, ripe
sapodilla, rotten sapodilla, green banana, ripe banana, rotten banana, green pineapple,
ripe pineapple, and rotten pineapple are shown in Figure 4.7. The signature pattern are
mean signature patterns. They are composed of eight features corresponding to the
eight sensors. It is visible from Figure 4.7 that the signatures are significantly
different and distinguishable from each other. The sensors S1 and S2 show
insignificant variation for different classes. The sensors S3 to S8 show significant
variations and produce distinct signature patterns for different fruit odors classes.
Ref. code: 25605722300067ONR
58
Figure 4.7 Signature patterns of the means of three types of fruits at three ripeness
states: (a) green banana, (b) ripe banana, (c) rotten banana, (d) green sapodilla, (e)
ripe sapodilla, (f) rotten sapodilla, (g) green pineapple, (h) ripe pineapple and (i)
rotten pineapple. [87]
Observation of any particular class for different sensors in the box plot in
Figure 4.8 reveal the classification process by the MMM method. Suppose a test data
which belongs to RS class is needed to be classified. According to MMM method the
test data will be assigned to a class for which all the features of the test data falls
within the min-max ranges of the corresponding features of the test data. From the
box plot in Figure 4.8 it is observed that the min-max range of sensor S1 for class RS
partially over laps with RtB, RtS, and RtP; for S2 the min-max range of RS partially
over laps with RtB, RtS, RP, and RtP; for S3 the min-max range of RS partially over
laps with GS; for S4 the min-max range of RS partially over laps with RtB, GS, and
RtS; for S5 the min-max range of RS partially over laps with RB, RtB, RtS, and RtP;
Ref. code: 25605722300067ONR
59
for S6 the min-max range of RS partially over laps with RB, RtS, and RtP; for S7 the
min-max range of RS partially over laps with RB, GS, and RtS; for S8 the min-max
Figure 4.8 Box plots showing min-max range of the sensor variables for the training
classes i.e. green banana (GB), ripe banana (RB), rotten banana (RtB), green sapodilla
(GS), ripe sapodilla (RS), rotten sapodilla (RtS), green pineapple (GP), ripe pineapple
(RP), and rotten pineapple (RtP).
Ref. code: 25605722300067ONR
60
range of RS partially over laps with RB, GS, RS, and RtS. Thus it is seen that there
exists no class all the features of which partially or fully over laps with the
corresponding features of class RS. As a result the data of class RS will be correctly
classified to the class and will not produce false classification. Same incidents are
likely to occur for other classes and thus the MMM method is capable to reduce false
classification as well as false alarm.
Table 4.9 Time taken by the algorithms to train and test with the data samples of
pineapple, sapodilla, and banana, each at three ripeness states. [87]
Algorithm Train time (sec.) Test time (sec.)
k-NN 0.2175 0.2175
SVM 0.7452 0.0461
GRNN 0.6922 0.0946
RBFNN 0.9922 0.0230
MLPNN 0.8652 0.0153
MMM 0.1874 0.0047
The MLPNN, GRNN, RBFNN, and MMM classification methods are
trained, validated, and tested by 70%, 15%, and 15% randomly chosen data from each
training class. The SVM and k-NN algorithms are trained and validated with 70% and
30% data as they do not require validation step.
In Table 4.9 time consumed by different classification algorithms for
testing and training are shown. The proposed MMM algorithm consumes minimum
training time compared to the k-NN, GRNN, SVM, MLPNN and RBFNN algorithms,
respectively. Testing time consumed by the MMM method is also found to be
minimum followed by the MLPNN, RBFNN, SVM, GRNN, and k-NN algorithms,
respectively. As observed training and testing time consumed by the proposed MMM
method are less compared to the MLPNN, SVM, RBFNN, k-NN, and GRNN
algorithms. The big „O‟ complexities of the algorithms given in section 2.6 are
O(LM(N+k)) for k-NN, O(IGRNN) for GRNN, O(I2
MLPNN) for MLPNN, within
O(NL2M
2) to O(NL
3M
3) for SVM, within O(M) to O(M+Ities(N+k)) for MMM and
within O(IRBFNN) to O(I2
RBFNN) for RBFNN method. It is found that the SVM and
MLPNN requires more computations and thereby are more complex. Due to large
Ref. code: 25605722300067ONR
61
number of hidden layer neuron requirements GRNN and RBFNN complexities are
high although they has low order of complexity. The MMM method incorporates k-
NN to break possible ties, however Ities are less likely to occur and O(M+Ities(N+k)) is
usually insignificant. The complexity of MMM method is approximately O(M).
Comparison between MMM, k-NN, MLPNN, GRNN, RBFNN, and SVM are
presented in Table 4.9.
Table 4.10 Misclassification error and correct classification rate of the classification
algorithms while testing with test data set from training classes i.e. banana, sapodilla,
and pineapple each at three ripeness states. [87]
Algorithm Misclassification error (%) Correct classification (%)
k-NN 1.8519 98.1481
SVM 1.8519 98.1481
GRNN 1.8519 98.1481
RBFNN 24.0740 75.9260
MLPNN 29.6296 70.3704
MMM 1.8519 98.1481
In Table 4.10 correct classification and misclassification performances are
summarized. Misclassification errors with the RBFNN is 24.0740% and for
29.6296%. The MMM, SVM, GRNN, and k-NN classification methods show
1.8519% misclassification errors. The misclassification and correct classification
results listed in Table 4.10 are consistent with the previous works. Classification
accuracy ranges from 82.4% to 100% with k-NN [19, 67-69], 86% to 98.66% with
SVM [67-72], 100% with GRNN [13], 88% to 100% with RBFNN [73-77], and 68%
to 100% with MLPNN [63, 64, 75-79].
Ref. code: 25605722300067ONR
62
Table 4.11 False classification performance of the algorithms with irrelevant data (i.e.
mango odor data at three ripeness states). [87]
Algorithm
False
classification
(%)
Misclassification to
unknown irrelevant
classes (%)
Unclassified or
correctly rejected
(%)
k-NN 100 0 0
SVM 100 0 0
GRNN 0 0 100
RBFNN 15 85 0
MLPNN 35 65 0
MMM 0 0 100
False classification and correct rejection performances of the
classification algorithms analyzed in this paper are summarized in Table 4.11. These
analyses have not been evaluated in literature yet to the best of our knowledge. It is
expected that the odor data from the irrelevant (i.e. mango odor data at three ripeness
states) classes should be correctly rejected by an E-Nose, and thereby should not
produce false classification errors. The k-NN and SVM algorithms falsely classify all
data of irrelevant classes and produce false alarm. The RBFNN algorithm falsely
classifies 15% of the mango data to trained classes and misclassifies 85% to unknown
extraneous classes. The MLPNN algorithm falsely classifies 35% of the data samples
and misclassifies 65%. It is seen that the MMM method and the GRNN method do not
show any false classification error and all irrelevant data samples are correctly
rejected.
Ref. code: 25605722300067ONR
63
Chapter 5
Conclusions and Recommendations
5.1 Conclusions
In this thesis an E-Nose is designed which comprises sensor panel
designed with MOG sensors, a data acquisition device, a sample chamber, a
measurement chamber, a computer for data storage, preprocessing, training, and
implement pattern recognition algorithms. Four kinds of fruits namely banana,
mango, sapodilla, and pineapple each at three ripeness states namely unripe, ripe, and
rotten are chosen as samples. Acquired data are preprocessed to make it suitable for
classification algorithms namely PCA, k-NN, SVM, LDA, MLPNN, RBFNN, and
GRNN. Performance of these classification algorithms are compared in terms of
speed, correct classification rate, and false classification rate.
The higher the number of sensors i.e. dimensionality the higher is the
sensor panel design cost, complexity and classification algorithm complexity. To
reduce dimensionality by excluding the sensors carrying insignificant or similar
information two new approaches, one is PCA loading and mutual information based
approach, and another is threshold based approach to optimize the number of sensors
and analyses the pattern classification performances in contrast to, between to within
variance ratio based method and exhaustive search method. It is seen that the number
of sensors in an E-Nose sensor panel can be reduced with the methods proposed in
this thesis with low pattern classification errors. Reduction of number of sensors in an
E-Nose sensor panel decreases data dimensionality as well as design cost and
complexity. The classification performance of the minimized sensor panel is observed
with k-NN, SVM, MLPNN, and RBFNN. The classification errors with MLPNN and
RBFNN are found high for this application research. The pattern recognition error
performance is found better with SVM, compared to k-NN, MLPNN, and RBFNN.
With the proposed sensor reduction techniques the number of sensors is reduced to
three with only 2.78% classification errors for SVM and 9.72% for k-NN. It is also
noted that both the proposed sensor array minimization methods resulted same
Ref. code: 25605722300067ONR
64
combination of optimal sensor set with k-NN and SVM pattern classification
algorithms.
Odor data from an irrelevant class should not be classified to any training
class by an E-Nose and thereby produces no false classification error as well as false
alarm. It is shown that the hyperspheric classification algorithms such as GRNN and
RBFNN with Gaussian activation function are capable to reduce false alarm with
GRNN having better performance. As GRNN comprises neurons to the order of
number of training samples it complexity become high with large training dataset a
new hyperspheric classification method i.e. MMM method is proposed in this thesis
which is less complex and fast. In addition the MMM method shows similar correct
classification of training data and false classification performance as GRNN. The
MMM method can be a prominent method for E-Nose application as it is simple to
implement, and shows small misclassification and false classification.
5.2 Future works
In this research ripeness states of four types of fruits are explored and
classified by the designed E-Nose. This research could be extended to analyze the
classification capability of the E-Nose to classify additional fruits. The classification
algorithms used for analyses are mostly used ones in literature. However the
classification performance of the algorithms namely Bayesian classification
algorithm, PNN, PLS discriminant analysis, quadrature discriminant analysis could be
explored with the designed E-Nose and compared with the applied and proposed
classification methods in this thesis.
Ref. code: 25605722300067ONR
65
References
1. Persaud, K. & George, D. (1982). Analysis of discrimination mechanisms in
the mammalian olfactory system using a model nose. Nature, 299(5881), 352-355.
2. Exporting Fresh Fruit and Vegetables to China: A Market Overview and
Guide for Foreign Suppliers. (2016). China: Produce Marketing Association.
3. Leffingwell, J. C. (2002). Olfaction. Retrieved September 29, 2014, from
http://www.leffingwell.com/olfaction.htm
4. Wikipedia. 2016. Olfaction. Retrieved June 11, 2014, from
https://en.wikipedia.org/wiki/Olfaction
5. Mamat, M., Samad, S. A. & Hannan, M. A. (2011). An electronic nose for
reliable measurement and correct classification of beverages. Sensors, 2011(11),
6435-6453. Retrieved March 16, 2014, from http://www.mdpi.com/1424-
8220/11/6/6435/pdf
6. Giordani, D. S., Castro, H. F., Oliveira, P. C., & Siqueira, A. F. (2007,
September). Biodiesel characterization using electronic nose and artificial neural
network. Paper presented at the Proceedings of the European Congress of Chemical
Engineering, Copenhagen, Denmark. Retrieved August 17, 2014, from
http://folk.ntnu.no/skoge/prost/proceedings/ecce6_sep07/upload/348.pdf
7. Chowdhury, S. S., Tudu, B., Bandyopadhyay, R., & Bhattacharyya, N. (2008,
December). Portable electronic nose system for aroma classification of black tea.
Paper presented at the Proceedings of the IEEE Region 10 and the Third Int. Conf. on
Industrial and Information Systems, Kharagpur, India. Retrieved December 1, 2014,
from http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4798403
8. Brezmes, J., Llobet, E., Vilanova, X., Saiz, G. & Correig. X. (2000). Fruit
ripeness monitoring using an Electronic Nose. Sensors and Actuators B: Chemical,
69(3), 223-229.
9. Fang, X., Guo, X., Shi, H., & Cai, Q. (2010, June). Determination of Ammonia
Nitrogen in Wastewater Using Electronic Nose. Paper presented at the Proceedings of
the 4th Int. Conf. on Bioinformatics and Biomedical Engineering (iCBBE), Chengdu,
China. Retrieved September 11, 2014, from
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5515426
Ref. code: 25605722300067ONR
66
10. Omatu, S., Araki, H., Fujinaka, T., & Yano, M. (2012). Intelligent
classification of odor data using neural networks. Paper presented at the Proceedings
of the Sixth International Conference on Advanced Engineering Computing and
Applications in Sciences. Barcelona, Spain. Retrieved August 29, 2012 from
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.677.544&rep=rep1&type=p
df
11. Soh, A. C., Chow, K. K., Yusuf, U. M., Ishak, A. J., Hassan, M. K., &
Khamis, S. (2014). Development of neural network-based electronic nose for herbs
recognition. International Journal on Smart Sensing and intelligent Systems, 7(2),
584–609.
12. Omatu, S. (2013, September). Odor classification by neural networks. Paper
presented at the Proceedings of the 2013 IEEE 7th International Conference on
Intelligent Data Acquisition and Advanced Computing Systems, Berlin, Germany.
Retrieved September 2, 2014, from
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6662695
13. Kurup, P. U. (2008, May). An electronic nose for detecting hazardous
chemicals and explosives. Paper presented at the Proceedings of the IEEE Conference
on Technologies for Homeland Security, Waltham, Massachusetts, USA. Retrieved
2014 September 1, 2014, from
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4534439
14. Feldhoff, R., Bernadet, P., & Saby, C. A. (1999). Discrimination of diesel
fuels with chemical sensors and mass spectrometry based electronic noses. Analyst,
124(8), 1167-1173.
15. Sobański, T., Szczurek, A., Nitsch, K., Licznerski, B. W., & Radwan, W.
(2006). Electronic nose applied to automotive fuel qualification. Sensors and
Actuators B: Chemical, 116(1), 207-212.
16. Berna, A. (2010). Metal oxide sensors for electronic noses and their
application to food analysis. Sensors, 10(4), 3882-3910. Retrieved December 29,
2014, from http://www.mdpi.com/1424-8220/10/4/3882/pdf
17. Mamat, M., & Samad, S. A. (2010, December). The design and testing of an
Electronic Nose prototype for classification problem. Paper presented at the
Proceedings of the 2010 International Conference on Computer Applications and
Ref. code: 25605722300067ONR
67
Industrial Electronics (ICCAIE), Kuala Lumpur, Malaysia. Retrieved November 12,
2014, from http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5735108
18. Zhe Zhang, Jin Tong, Dong-hui Chen & Yu-bin Lan. (2008). Electronic nose
with an air sensor matrix for detecting beef freshness [Electronic version]. Journal of
Bionic Engineering, 5(1), 67-73.
19. Tang, K. T., Chiu, S. W., Pan, C. H., Hsieh, H. Y., Liang, Y. S., & Liu, S. C.
(2010). Development of a portable electronic nose system for the detection and
classification of fruity odors. Sensors, 10(10), 9179-9193. Retrieved January 11,
2015, from http://www.mdpi.com/1424-8220/10/10/9179/pdf
20. Dutta, R., Hines, E. L., Gardner, J. W., Kashwan, K. R., & Bhuyan, M. (2003,
July). Determination of tea quality by using a neural network based electronic nose.
Paper presented at the Proceedings of the International Joint Conference on Neural
Networks, Portland, Oregon, USA. Retrieved October 5, 2014, from
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1223380
21. Concina, I., Falasconi, M., & Sberveglieri, V. (2012). Electronic noses as
flexible tools to assess food quality and safety: Should we trust them?. IEEE sensors
journal, 12(11), 3232-3237.
22. Brezmes, J., Fructuoso, M. L., Llobet, E., Vilanova, X., Recasens, I., Orts, J.
et al. (2005). Evaluation of an electronic nose to assess fruit ripeness. IEEE Sensors
Journal, 5(1), 97-108.
23. Hines, E. L., Llobet, E., & Gardner, J. W. (1999). Neural network based
electronic nose for apple ripeness determination. Electronics Letters, 35(10), 821-823.
24. Xiaobo, Z., & Jiewen, Z. (2005, October). Apple quality assessment by fusion
three sensors. Paper presented at the Proceedings of the IEEE Sensors, Irvine,
California, USA. Retrieved November 16, 2014, from
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1597717
25. Zhang, L., & Tian, F. (2012, November). A novel chaotic sequence
optimization neural network for concentration estimation of formaldehyde by an
electronic nose. Paper presented at 2012 Fourth International Conference on
Computational Intelligence and Communication Networks (CICN), Uttar Pradesh,
India. Retrieved September 16, 2015, from
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6375235
Ref. code: 25605722300067ONR
68
26. Abdullah, A. H., Shakaff, A. M., Zakaria, A., Saad, F. S. A., Shukor, S. A., &
Mat, A. (2014, August). Application Specific Electronic Nose (ASEN) for Ganoderma
boninense detection using artificial neural network. Paper presented at the
Proceedings of the 2nd International Conference on Electronic Design (ICED),
Penang, Malaysia. Retrieved August 19, 2014, from
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7015788
27. Pardo, M., Faglia, G., Sberveglieri, G., & Quercia, L. (2001, May). Electronic
nose for coffee quality control. Paper presented at the Proceedings of the IEEE
Instrumentation and Measurement Technology Conference, Budapest, Hungary.
Retrieved November 29, 2014, from
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=928799
28. Singh, R. (2002, January). An intelligent system for odour discrimination.
Paper presented at the Proceedings of the First IEEE International workshop on
Electronic Design, Test and Applications (DELTA‟02), Christchurch, New Zealand.
Retrieved December 18, 2014, from
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=994681
29. Bhattacharyya, N., Tudu, B., Bandyopadhyay, R., Bhuyan, M., & Mudi, R.
(2004, November). Aroma characterization of orthodox black tea with electronic
nose. Paper presented at the Proceedings of the IEEE Region 10 Conference
TENCON 2004, Chiang Mai, Thailand. Retrieved March 2, 2015, from
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1414623
30. García-Cortés, A., Martí, J., Sayago, I., Santos, J. P., Gutiérrez, J., & Horrillo,
M. C. (2009, February). Detection of stress through sweat analysis with an electronic
nose. Paper presented at the Proceedings of the Spanish conf. on electronic devices,
Santiago de Compostela, Spain. Retrieved December 4, 2014, from
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4800501
31. Sanaeifar, A., Mohtasebi, S. S., Ghasemi-Varnamkhasti, M., & Siadat, M.
(2014, November). Application of an electronic nose system coupled with artificial
neural network for classification of banana samples during shelf-life process. Paper
presented at the Proceedings of the 2014 International Conference on Control,
Decision and Information Technologies (CoDIT), Metz, France. Retrieved November
16, 2015, from http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6996991
Ref. code: 25605722300067ONR
69
32. Wang, X. D., Zhang, H. R., & Zhang, C. J. (2005, August). Signals
recognition of electronic nose based on support vector machines. Paper presented at
the Proceedings of the 2005 International Conference on Machine Learning and
Cybernetics, Guangzhou, China. Retrieved January 12, 2016, from
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1527528
33. ul Hasan, N., Ejaz, N., Ejaz, W., & Kim, H. S. (2012, October). Malicious
odor item identification using an electronic nose based on support vector machine
classification. Paper presented at the Proceedings of the 2012 IEEE 1st Global
Conference on Consumer Electronics (GCCE), Tokyo, Japan. Retrieved November 18
2015, year, from http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6379638
34. Perera, A., Gomez-Baena, A., Sundic, T., Pardo, T., & Marco, S. (2002).
Machine olfaction: pattern recognition for the identification of aromas. 16th
International Conference on Pattern Recognition, 2, 410-413.
35. Hassan, M., & Bermak, A. (2014, November). Threshold detection of
carcinogenic odor of formaldehyde with wireless electronic nose. Paper presented at
the Proceedings of the 2014 IEEE Sensors, Valencia, Spain. Retrieved June 17, 2015,
from http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6985266
36. Romani, S., Cevoli, C., Fabbri, A., Alessandrini, L., & Dalla Rosa, M. (2012).
Evaluation of Coffee Roasting Degree by Using Electronic Nose and Artificial Neural
Network for Off‐line Quality Control. Journal of Food Science, 77(9), 960–965.
37. Hai, Z., & Wang, J. (2006). Electronic nose and data analysis for detection of
maize oil adulteration in sesame oil. Sensors and Actuators B: Chemical, 119(2), 449-
455.
38. Brezmes, J., Llobet, E., Vilanova, X., Saiz, G., & Correig, X. (2000). Fruit
ripeness monitoring using an electronic nose. Sensors and Actuators B: Chemical,
69(3), 223-229.
39. Berna, A. ., Lammertyn, J., Saevels, S., Di Natale, C., & Nicola , B. M.
(2004). Electronic nose systems to study shelf life and cultivar effect on tomato aroma
profile. Sensors and Actuators B: Chemical, 97(2), 324-333.
40. Gómez, A. H., Hu, G., Wang, J., & Pereira, A. G. (2006). Evaluation of
tomato maturity by electronic nose. Computers and Electronics in Agriculture, 54(1),
44-52.
Ref. code: 25605722300067ONR
70
41. Gómez, A. H., Wang, J., Hu, G., & Pereira, A. G. (2008). Monitoring storage
shelf life of tomato using electronic nose technique. Journal of Food Engineering,
85(4), 625-631.
42. Brezmes, J., Llobet, E., Vilanova, X., Orts, J., Saiz, G., & Correig, X. (2001).
Correlation between electronic nose signals and fruit quality indicators on shelf-life
measurements with pinklady apples. Sensors and Actuators B: Chemical, 80(1), 41-
50.
43. Saevels, S., Lammertyn, J., Berna, A. Z., Veraverbeke, E. A., Di Natale, C., &
Nicola , B. M. (2003). Electronic nose as a non-destructive tool to evaluate the
optimal harvest date of apples. Postharvest Biology and Technology, 30(1), 3-14.
44. Saevels, S., Lammertyn, J., Berna, A. Z., Veraverbeke, E. A., Di Natale, C., &
Nicola , B. M. (2004). An electronic nose and a mass spectrometry-based electronic
nose for assessing apple quality during shelf life. Postharvest Biology and
Technology, 31(1), 9-19.
45. Xiaobo, Z., & Jiewen, Z. (2005, October). Apple quality assessment by fusion
three sensors. Paper presented at the Proceedings of the IEEE Sensors, Irvine, USA.
Retrieved October 9, 2014, from
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1597717
46. Gómez, A. H., Wang, J., Hu, G., & Pereira, A. G. (2006). Electronic nose
technique potential monitoring mandarin maturity. Sensors and Actuators B:
Chemical, 113(1), 347-353.
47. Benedetti, S., Buratti, S., Spinardi, A., Mannino, S., & Mignani, I. (2008).
Electronic nose as a non-destructive tool to characterise peach cultivars and to
monitor their ripening stage during shelf-life. Postharvest Biology and Technology,
47(2), 181-188.
48. Di Natale, C., Macagnano, A., Martinelli, E., Paolesse, R., Proietti, E., &
D‟Amico, A. (2001). The evaluation of quality of post-harvest oranges and apples by
means of an electronic nose. Sensors and Actuators B: Chemical, 78(1), 26-31.
49. Kumbhar, A., Gharpure, D. C., Botre, B. A., & Sadistap, S. S. (2012, March).
Embedded e-nose for food inspection. Paper presented at the Proceedings of the 2012
1st International Symposium on Physics and Technology of Sensors (ISPTS-1), Pune,
Ref. code: 25605722300067ONR
71
India. Retrieved November 19, 2014, from
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6260955
50. Adak, M. F., & Yumusak, N. (2016). Classification of E-nose aroma data of
four fruit types by ABC-based neural network. Sensors, 16(3), 304. Retrieved July 24,
2016, from http://www.mdpi.com/1424-8220/16/3/304/pdf
51. Zhang, S., Xie, C., Zeng, D., Li, H., Liu, Y., & Cai, S. (2009). A sensor array
optimization method for electronic noses with sub-arrays. Sensors and Actuators B:
Chemical, 142(1), 243-252.
52. Zhao, L., Shi, B. L., Wang, H. Y., & Li, Z. (2009). Combination Optimization
Method for Screening Sensor Array of Electronic Nose. Journal of Food Science, 20,
087.
53. Hongmei, Z., & Jun, W. (2006). Optimization of sensor array of electronic
nose and its application to detection of storage age of wheat grain. Transactions of the
Chinese Society of Agricultural Engineering, 22(12), 164-167.
54. Shi, B., Zhao, L., Zhi, R. & Xi. X. (2013). Optimization of electronic nose
sensor array by genetic algorithms in Xihu-Longjing Tea quality analysis.
Mathematical and Computer Modelling, 58(3), 752-758.
55. Tabassum, M., & Mathew, K. (2014). A genetic algorithm analysis towards
optimization solutions. International Journal of Digital Information and Wireless
Communications (IJDIWC), 4(1), 124-142. Retrieved July 21, 2015, from
http://sdiwc.net/digital-
library/request.php?article=860cd5681b5f2bd16486ee6f367b2437
56. Duda, R. O., Stork, D. G., & Hart, P. E. (2001). Pattern classification. New
York: Wiley.
57. Sysoev, V. V., Musatov, V. Y., Silaev, A. V., & Zalyalov, T. R. (2007, April).
The Optimization of Number of Sensors in One-Chip Electronic Nose Microarrays
with the Help of 3-Layered Neural Network. Paper presented at the Proceedings of the
Siberian Conference on Control and Communications (SIBCON-2007), Tomsk,
Russia. Retrieved June 18, 2015, from
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4233301
Ref. code: 25605722300067ONR
72
58. Omatu, S. & Yoshioka, M. (2009). Electronic Nose for a Fire Detection
System by Neural Networks. 2nd IFAC Conference on Intelligent Control Systems
and Signal Processing, 42(19), 209-214.
59. Scorsone, E., Pisanelli, A. M., & Persaud, K. C. (2006). Development of an
electronic nose for fire detection. Sensors and Actuators B: Chemical, 116(1), 55-61.
Retrieved January 14, 2016, from http://ac.els-cdn.com/S0925400506001626/1-s2.0-
S0925400506001626-main.pdf?_tid=40c47c82-01c6-11e7-946e-
00000aacb362&acdnat=1488733846_f1895146c7332e9219f4aacb0b755afa
60. Reimann, P., & Schütze, A. (2012). Fire detection in coal mines based on
semiconductor gas sensors. Sensor Review, 32(1), 47-58. Retrieved August 18, 2015,
from http://www.emeraldinsight.com/doi/pdfplus/10.1108/02602281211197143
61. Kwan, C., Schmera, G., Smulko, J. M., Kish, L. B., Heszler, P., & Granqvist,
C. G. (2008). Advanced agent identification with fluctuation-enhanced sensing. IEEE
Sensors Journal, 8(6), 706-713. Retrieved July 11, 2015, from
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4529196
62. Go´mez, A. H., Wang, J., Hu, G. & Pereira, A. G. (2007). Discrimination of
storage shelf-life for mandarin by electronic nose technique. LWT-Food Science and
Technology, 40(4), 681-689.
63. Yu, H., & Wang, J. (2007). Discrimination of LongJing green-tea grade by
electronic nose. Sensors and Actuators B: Chemical, 122(1), 134-140.
64. Kiani, S., Minaei, S., & Ghasemi-Varnamkhasti, M. (2016). A portable
electronic nose as an expert system for aroma-based classification of saffron.
Chemometrics and Intelligent Laboratory Systems, 156, 148-156.
65. Rahman M. M., Charoenlarpnopparut C. & Suksompong P. (2015)
Classification and pattern recognition algorithms applied to E-Nose. 2015 2nd
International Conference on Electrical Information and Communication Technologies
(EICT), pp. 44-48.
66. Rahman M. M., Charoenlarpnopparut C. & Suksompong P. Signal processing
for multi-sensor E-nose system: Acquisition and classification. 2015 10th
International Conference on Information, Communications and Signal Processing
(ICICS), pp. 1-5.
Ref. code: 25605722300067ONR
73
67. Güney, S. & Atasoy, A. (2012). Multiclass classification of n-butanol
concentrations with k-nearest neighbor algorithm and support vector machine in an
electronic nose. Sensors and Actuators B: Chemical, 166, 721-725.
68. Shao, X., Li, H., Wang, N., & Zhang, Q. (2015). Comparison of different
classification methods for analyzing electronic nose data to characterize sesame oils
and blends. Sensors, 15(10), 26726-26742. Retrieved April 21, 2016, from
http://www.mdpi.com/1424-8220/15/10/26726/pdf
69. Güney, S., & Atasoy, A. (2011, June). Classification of n-butanol
concentrations with k-NN algorithm and ANN in electronic nose. Paper presented at
the Proceedings of the International Symposium on Innovations in Intelligent Systems
and Applications (INISTA), Istanbul, Turkey. Retrieved November 4, 2014, from
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5946057
70. Khalaf, W., Pace, C., & Gaudioso, M. (2008). Gas detection via machine
learning. World Academy Of Science, Engineering And Technology, 27, 139-143.
71. Amari, A., El Barbri, N., Llobet, E., El Bari, N., Correig, X., & Bouchikhi, B.
(2006). Monitoring the freshness of Moroccan sardines with a neural-network based
electronic nose. Sensors, 6(10), 1209-1223. Retrieved March 28, 2015,
http://www.mdpi.com/1424-8220/6/10/1209/pdf
72. Sanaeifar, A., Mohtasebi, S., Ghasemi-Varnamkhasti, M., Ahmadi, H., &
Lozano Rogado, J. S. (2014). Development and application of a new low cost
electronic nose for the ripeness monitoring of banana using computational techniques
(PCA, LDA, SIMCA, and SVM). Czech Journal of Food Sciences, 32(6), 538-548.
Retrieved January 7, 2016, from
http://dehesa.unex.es/bitstream/handle/10662/4367/1805-
9317_32_6_538.pdf?sequence=1
73. Xiong, Y., Xiao, X., Yang, X., Yan, D., Zhang, C., Zou, H. et al. (2014).
Quality control of Lonicera japonica stored for different months by electronic nose.
Journal of pharmaceutical and biomedical analysis, 91, 68-72. Retrieved February
11, 2016, from http://ac.els-cdn.com/S0731708513006018/1-s2.0-
S0731708513006018-main.pdf?_tid=7b11690c-0236-11e7-9e83-
00000aacb361&acdnat=1488782047_6de8a06b91ea78c0f85e84027c3f0936
Ref. code: 25605722300067ONR
74
74. Evans, P., Persaud, K. C., McNeish, A. S., Sneath, R. W., Hobson, N., &
Magan, N. (2000). Evaluation of a radial basis function neural network for the
determination of wheat quality from electronic nose data. Sensors and Actuators B:
Chemical, 69(3), 348-358.
75. Dutta, R., Hines, E. L., Gardner, J. W., Udrea, D. D., & Boilot, P. (2003).
Non-destructive egg freshness determination: an electronic nose based approach.
Measurement Science and Technology, 14(2), 190.
76. Dutta, R., Kashwan, K. R., Bhuyan, M., Hines, E. L., & Gardner, J. W. (2003).
Electronic nose based tea quality standardization. Neural Networks, 16(5), 847-853.
77. Borah, S., Hines, E. L., Leeson, M. S., Iliescu, D. D., Bhuyan, M., & Gardner,
J. W. (2008). Neural network based electronic nose for classification of tea aroma.
Sensing and Instrumentation for Food Quality and Safety, 2(1), 7-14.
78. Anjos, O., Iglesias, C., Peres, F., Martínez, J., García, Á., & Taboada, J.
(2015). Neural networks applied to discriminate botanical origin of honeys. Food
Chemistry, 175, 128-136.
79. Llobet, E., Hines, E. L., Gardner, J. W., Bartlett, P. N., & Mottram, T. T.
(1999). Fuzzy ARTMAP based electronic nose data analysis. Sensors and Actuators
B: Chemical, 61(1), 183-190.
80. Cooper, P. W. (1962). The hypersphere in pattern recognition. Information
and Control, 5(4), 324-346.
81. Hwang, J. N., Choi, J. J., Oh, S., & Marks, R. J. (1990, May). Classification
boundaries and gradients of trained multilayer perceptrons. Paper presented at the
Proceedings of the IEEE International Symposium on Circuits and Systems, New
Orleans, USA. Retrieved January 9, 2015, from
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=112706
82. Tax, D. M., & Duin, R. P. (1999). Support vector domain description. Pattern
Recognition Letters, 20(11), 1191-1199.
83. Rahman, M. M., Charoenlarpnopparut C., Suksompong P., and Toochinda P.
(2017) Sensor Array Optimization for Complexity Reduction in Electronic Nose
System. ECTI Transactions on Electrical Engineering, Electronics, and
Communications, 1, 49-59.
Ref. code: 25605722300067ONR
75
84. Kuhn, H.W.; Tucker, A.W. Nonlinear programming. Proc. Second Berkeley
Symposium on Mathematical Statistics and Probability, Berkeley, USA, July-August
1950; Jerzy Neyman; Publisher: University of California Press, Berkeley, USA, 1951.
85. Krafft, P. (2013). Building Intelligent Probabilistic Systems. Retrieved
January 15, 2016, from https://hips.seas.harvard.edu/blog/2013/02/13/correlation-and-
mutual-information/
86. Rahman, M.M., Charoenlarpnopparut, C., Suksompong, P., Toochinda, P. and
Taparugssanagorn, A. (2017) A False Alarm Reduction Method for a Gas Sensor
Based Electronic Nose. Sensors, 17(9), p.2089.
87. Rahman, M. M., Charoenlarpnopparut C., Suksompong P., and
Taparugssanagorn A. E-Nose False Alarm Reduction: Hyperplane versus
Hyperspheric Classification Boundary. Walailak Journal of Science and Technology
(WJST). [In Review]
Ref. code: 25605722300067ONR
76
Appendices
Ref. code: 25605722300067ONR
77
Appendix A
Matlab Codes for Classification Methods
% PCA classification
clear;
tic
% data preparation
[cfruits,ctxt,craw] = xlsread('centralMatlabBMSP.xlsx'); % dataset import
TGS2612 = cfruits(:,2);
TGS821 = cfruits(:,3);
TGS822 = cfruits(:,4);
TGS813 = cfruits(:,5);
TGS2602 = cfruits(:,6);
TGS2603 = cfruits(:,7);
TGS2620 = cfruits(:,8);
TGS2610 = cfruits(:,9);
S = [TGS2612,TGS821,TGS822,TGS813,TGS2602,TGS2603,TGS2620,TGS2610];
% PCA
[coeffS,scoresS,latentS,tsquaredS,explainedS] = pca(S(:,:));
figure(1); FS = 20;
biplotLabels = {'\fontname{Times New Roman} \fontsize{20} TGS 2612',...
'\fontname{Times New Roman} \fontsize{20} TGS 821','\fontname{Times New
Roman} \fontsize{20} TGS 822',...
'\fontname{Times New Roman} \fontsize{20} TGS 813','\fontname{Times New
Roman} \fontsize{20} TGS 2602',...
'\fontname{Times New Roman} \fontsize{20} TGS 2603','\fontname{Times New
Roman} \fontsize{20} TGS 2620',...
'\fontname{Times New Roman} \fontsize{20} TGS 2610'};
biplot(coeffS(:,1:2),'Linewidth',2,'VarLabels',biplotLabels)
xlabel('\fontname{Times New Roman} \fontsize{25} PC 1');
ylabel('\fontname{Times New Roman} \fontsize{25} PC 2');
Ref. code: 25605722300067ONR
78
set(gca,'FontSize',20)
figure(2);
pareto(explainedS(1:2,:));
xlabel('\fontname{Times New Roman} \fontsize{25} Principal Component');
ylabel('\fontname{Times New Roman} \fontsize{25} Variance Explained (%)')
title('\fontname{Times New Roman} \fontsize{25} Variance explained of PCA');
set(gca,'FontSize',25)
grid on
figure(3);
Nd = 20;
plot3(scoresS(1:Nd,1),scoresS(1:Nd,2),scoresS(1:Nd,3),'ko',scoresS((20*1+1):20*1+
Nd,1),scoresS((20*1+1):20*1+Nd,2),scoresS((20*1+1):20*1+Nd,3),'kx',...
scoresS((20*2+1):20*2+Nd,1),scoresS((20*2+1):20*2+Nd,2),scoresS((20*2+1):20*2
+Nd,3),'k+',scoresS((20*3+1):20*3+Nd,1),scoresS((20*3+1):20*3+Nd,2),scoresS((20
*3+1):20*3+Nd,3),'k*',...
scoresS((20*4+1):20*4+Nd,1),scoresS((20*4+1):20*4+Nd,2),scoresS((20*4+1):20*4
+Nd,3),'ks',scoresS((20*5+1):20*5+Nd,1),scoresS((20*5+1):20*5+Nd,2),scoresS((20
*5+1):20*5+Nd,3),'kd',...
scoresS((20*6+1):20*6+Nd,1),scoresS((20*6+1):20*6+Nd,2),scoresS((20*6+1):20*6
+Nd,3),'kv',scoresS((20*7+1):20*7+Nd,1),scoresS((20*7+1):20*7+Nd,2),scoresS((20
*7+1):20*7+Nd,3),'k^',...
scoresS((20*8+1):20*8+Nd,1),scoresS((20*8+1):20*8+Nd,2),scoresS((20*8+1):20*8
+Nd,3),'kh',scoresS((20*9+1):20*9+Nd,1),scoresS((20*9+1):20*9+Nd,2),scoresS((20
*9+1):20*9+Nd,3),'k>',...
scoresS((20*10+1):20*10+Nd,1),scoresS((20*10+1):20*10+Nd,2),scoresS((20*10+1
):20*10+Nd,3),'kp',scoresS((20*11+1):20*11+Nd,1),scoresS((20*11+1):20*11+Nd,2
),scoresS((20*11+1):20*11+Nd,3),'k<','LineWidth',2,'MarkerSize',15);grid on
xlabel('\fontname{Times New Roman} \fontsize{30} PC 1');
ylabel('\fontname{Times New Roman} \fontsize{30} PC 2');
zlabel('\fontname{Times New Roman} \fontsize{30} PC 3');
title('\fontname{Times New Roman} \fontsize{30} Principal Component Analysis');
Ref. code: 25605722300067ONR
79
legend('\fontname{Times New Roman} \fontsize{25} GB','\fontname{Times New
Roman} \fontsize{25} RB',...
'\fontname{Times New Roman} \fontsize{25} RtB','\fontname{Times New
Roman} \fontsize{25} GM',...
'\fontname{Times New Roman} \fontsize{25} RM','\fontname{Times New
Roman} \fontsize{25} RtM',...
'\fontname{Times New Roman} \fontsize{25} GS','\fontname{Times New Roman}
\fontsize{25} RS',...
'\fontname{Times New Roman} \fontsize{25} RtS','\fontname{Times New
Roman} \fontsize{25} GP',...
'\fontname{Times New Roman} \fontsize{25} RP','\fontname{Times New Roman}
\fontsize{25} RtP');
set(gca,'FontSize',30)
% Between to within covariance of the dataset
figure(4);
fcov = cov(S);
bar3(fcov);
legend('TGS2612','TGS821','TGS822','TGS813','TGS2602','TGS2603','TGS2620','TG
S2610');
xlabel('\fontname{Times New Roman} \fontsize{16} Sensor');
ylabel('\fontname{Times New Roman} \fontsize{16} Sensor');
title('\fontname{Times New Roman} \fontsize{15} Between and within covariance of
the sensors')
set(gca,'FontSize',15)
eTimePCA = toc
% PCA PLOT THREE CLASSES ONLY ripe (mango, sapodilla, pineapple)
figure;
Nd = 20
plot(scoresS((20*4+1):20*4+Nd,1),scoresS((20*4+1):20*4+Nd,2),'k*',...
scoresS((20*7+1):20*7+Nd,1),scoresS((20*7+1):20*7+Nd,2),'ks',...
scoresS((20*10+1):20*10+Nd,1),scoresS((20*10+1):20*10+Nd,2),'k^','LineWidth',2,'
MarkerSize',15);grid on
Ref. code: 25605722300067ONR
80
xlabel('\fontname{Times New Roman} \fontsize{30} PC 1');
ylabel('\fontname{Times New Roman} \fontsize{30} PC 2');
title('\fontname{Times New Roman} \fontsize{30} Principal Component Analysis');
legend('\fontname{Times New Roman} \fontsize{30} Ripe mango','\fontname{Times
New Roman} \fontsize{30} Ripe sapodilla',...
'\fontname{Times New Roman} \fontsize{30} Ripe pineapple');
set(gca,'FontSize',30)
% k-NN classification
clc
tic % time count begin
format compact; % formatting
[cfruits,ctxt,craw] = xlsread('centralMatlabBMSP.xlsx'); % import dataset
TGS2612 = cfruits(:,2);
TGS821 = cfruits(:,3);
TGS822 = cfruits(:,4);
TGS813 = cfruits(:,5);
TGS2602 = cfruits(:,6);
TGS2603 = cfruits(:,7);
TGS2620 = cfruits(:,8);
TGS2610 = cfruits(:,9);
% Training and test data preparation
S = [TGS2612,TGS821,TGS822,TGS813,TGS2602,TGS2603,TGS2620,TGS2610];
rIndex = [randsample(1:20,20)';randsample(21:40,20)';randsample(41:60,20)'; ...
randsample(61:80,20)';randsample(81:100,20)';randsample(101:120,20)'; ...
randsample(121:140,20)';randsample(141:160,20)';randsample(161:180,20)'; ...
randsample(181:200,20)';randsample(201:220,20)';randsample(221:240,20)'];
trainIndex =
[rIndex(20*0+1:16+20*0);rIndex(20*1+1:16+20*1,:);rIndex(20*2+1:16+20*2,:);...
rIndex(20*3+1:16+20*3,:);rIndex(20*4+1:16+20*4,:);rIndex(20*5+1:16+20*5,:);...
rIndex(20*6+1:16+20*6,:);rIndex(20*7+1:16+20*7,:);rIndex(20*8+1:16+20*8,:);...
Ref. code: 25605722300067ONR
81
rIndex(20*9+1:16+20*9,:);rIndex(20*10+1:16+20*10,:);rIndex(20*11+1:16+20*11,:
)];
testIndex =
[rIndex(20*0+17:20*1);rIndex(20*1+17:20*2,:);rIndex(20*2+17:20*3,:);...
rIndex(20*3+17:20*4,:);rIndex(20*4+17:20*5,:);rIndex(20*5+17:20*6,:);...
rIndex(20*6+17:20*7,:);rIndex(20*7+17:20*8,:);rIndex(20*8+17:20*9,:);...
rIndex(20*9+17:20*10,:);rIndex(20*10+17:20*11,:);rIndex(20*11+17:20*12,:)];
trainData = S(trainIndex,:);
target = cfruits(trainIndex,10);
testData = S(testIndex,:);
testTarget = cfruits(testIndex,10);
% kNN model
knnmodel = fitcknn(trainData,target,'NumNeighbors',13,'Standardize',1);
Odors = predict(knnmodel, testData)
% Counting number of errors
error = 0;
for i=1:length(testData);
if testTarget(i)~=Odors(i)
error = error+1;
end
end
error
eTimeKNN = toc % Time duration
% SVM classification
% multiclass svm by fitcecoc (ecoc = error correction output code)
clear
tic
format compact;
% Import dataset
[cfruits,ctxt,craw] = xlsread('centralMatlabBMSP.xlsx');
% Prepare training and test data
Ref. code: 25605722300067ONR
82
TGS2612 = cfruits(:,2);
TGS821 = cfruits(:,3);
TGS822 = cfruits(:,4);
TGS813 = cfruits(:,5);
TGS2602 = cfruits(:,6);
TGS2603 = cfruits(:,7);
TGS2620 = cfruits(:,8);
TGS2610 = cfruits(:,9);
S = [TGS2612,TGS821,TGS822,TGS813,TGS2602,TGS2603,TGS2620,TGS2610];
rIndex = [randsample(1:20,20)';randsample(21:40,20)';randsample(41:60,20)'; ...
randsample(61:80,20)';randsample(81:100,20)';randsample(101:120,20)'; ...
randsample(121:140,20)';randsample(141:160,20)';randsample(161:180,20)'; ...
randsample(181:200,20)';randsample(201:220,20)';randsample(221:240,20)'];
trainIndex =
[rIndex(20*0+1:14+20*0);rIndex(20*1+1:14+20*1,:);rIndex(20*2+1:14+20*2,:);...
rIndex(20*3+1:14+20*3,:);rIndex(20*4+1:14+20*4,:);rIndex(20*5+1:14+20*5,:);...
rIndex(20*6+1:14+20*6,:);rIndex(20*7+1:14+20*7,:);rIndex(20*8+1:14+20*8,:);...
rIndex(20*9+1:14+20*9,:);rIndex(20*10+1:14+20*10,:);rIndex(20*11+1:14+20*11,:
)];
testIndex =
[rIndex(20*0+15:20*1);rIndex(20*1+15:20*2,:);rIndex(20*2+15:20*3,:);...
rIndex(20*3+15:20*4,:);rIndex(20*4+15:20*5,:);rIndex(20*5+15:20*6,:);...
rIndex(20*6+15:20*7,:);rIndex(20*7+15:20*8,:);rIndex(20*8+15:20*9,:);...
rIndex(20*9+15:20*10,:);rIndex(20*10+15:20*11,:);rIndex(20*11+15:20*12,:)];
trainData = S(trainIndex,:);
trainTarget = cfruits(trainIndex,10);
testData = S(testIndex,:);
testTarget = cfruits(testIndex,10);
% Prepare and train an SVM model
t = templateSVM('Standardize',1,'KernelFunction','Gaussian');
Ref. code: 25605722300067ONR
83
SVMMdl =
fitcecoc(trainData,trainTarget,'Learners',t);%,'ResponseName',responseName,...
MultiSVMtrainTime = toc;
% test classification errors of the SVM model
tic
classTestData = predict(SVMMdl,testData)
SVMerror = 0;
for i=1:length(testData);
if testTarget(i)~=classTestData(i)
SVMerror = SVMerror+1;
end
end
disp(['SVM classification errors:', num2str(SVMerror)]);
disp(['SVM training time :', num2str(MultiSVMtrainTime)]);
MultiSVMtestTime = toc;
disp(['SVM testing time :', num2str(MultiSVMtestTime)]);
% MLPNN classification
tic
format compact;
% import dataset
[cfruits,ctxt,craw] = xlsread('centralMatlabBMSP.xlsx');
% extract sensor variables
TGS2612 = cfruits(:,2);
TGS821 = cfruits(:,3);
TGS822 = cfruits(:,4);
TGS813 = cfruits(:,5);
TGS2602 = cfruits(:,6);
TGS2603 = cfruits(:,7);
TGS2620 = cfruits(:,8);
TGS2610 = cfruits(:,9);
% prepare training, test, and validation data
Ref. code: 25605722300067ONR
84
S = [TGS2612,TGS821,TGS822,TGS813,TGS2602,TGS2603,TGS2620,TGS2610];
rIndex = [randsample(1:20,20)';randsample(21:40,20)';randsample(41:60,20)'; ...
randsample(61:80,20)';randsample(81:100,20)';randsample(101:120,20)'; ...
randsample(121:140,20)';randsample(141:160,20)';randsample(161:180,20)'; ...
randsample(181:200,20)';randsample(201:220,20)';randsample(221:240,20)'];
trainIndex =
[rIndex(20*0+1:14+20*0);rIndex(20*1+1:14+20*1,:);rIndex(20*2+1:14+20*2,:);...
rIndex(20*3+1:14+20*3,:);rIndex(20*4+1:14+20*4,:);rIndex(20*5+1:14+20*5,:);...
rIndex(20*6+1:14+20*6,:);rIndex(20*7+1:14+20*7,:);rIndex(20*8+1:14+20*8,:);...
rIndex(20*9+1:14+20*9,:);rIndex(20*10+1:14+20*10,:);rIndex(20*11+1:14+20*11,:
)];
valIndex =
[rIndex(20*0+15:17*1);rIndex(20*1+15:17*2,:);rIndex(20*2+15:17*3,:);...
rIndex(20*3+15:17*4,:);rIndex(20*4+15:17*5,:);rIndex(20*5+15:17*6,:);...
rIndex(20*6+15:17*7,:);rIndex(20*7+15:17*8,:);rIndex(20*8+15:17*9,:);...
rIndex(20*9+15:17*10,:);rIndex(20*10+15:17*11,:);rIndex(20*11+15:17*12,:)];
testIndex =
[rIndex(20*0+18:20*1);rIndex(20*1+18:20*2,:);rIndex(20*2+18:20*3,:);...
rIndex(20*3+18:20*4,:);rIndex(20*4+18:20*5,:);rIndex(20*5+18:20*6,:);...
rIndex(20*6+18:20*7,:);rIndex(20*7+18:20*8,:);rIndex(20*8+18:20*9,:);...
rIndex(20*9+18:20*10,:);rIndex(20*10+18:20*11,:);rIndex(20*11+18:20*12,:)];
trainData = S(trainIndex,:);
trainTarget = cfruits(trainIndex,10);
testData = S(testIndex,:);
testTarget = cfruits(testIndex,10);
% designing the MLPNN
net = feedforwardnet(10); % feedforwardnet(hiddenSizes,trainFcn)
net.trainParam.min_grad = 1e-5;
net.trainParam.max_fail = 6; % for validation
displayed in the command line, you can set the parameter
net.trainParam.showCommandLine = true;
net.divideParam.trainInd = trainIndex;
Ref. code: 25605722300067ONR
85
net.divideParam.valInd = valIndex;
net.divideParam.testInd = testIndex;
[net,tr] = train(net,S',cfruits(:,10)'); % initialization of weights and training is done by
'train command'
FFBPNNtrainTime = toc;
disp(['MLPNN training time:', num2str(FFBPNNtrainTime)]);
tic
building data set
ar = round(net(testData')) % classify test data
% count errors
MLPNNerror = 0;
for i=1:length(testData);
if testTarget(i)~=ar(i)
MLPNNerror = MLPNNerror+1;
end
end
FFBPNNtestTime = toc;
disp(['MLPNN testing time :', num2str(FFBPNNtestTime)]);
disp(['MLPNN errors time :', num2str(MLPNNerror)]);
figure;
plotperform(tr); % plots number of epoch vs. mean squared error
set(gca,'FontSize',14,'FontWeight','bold');
figure;
plottrainstate(tr); % plot training state: validation fail, mu, gradient
figure
e = testTarget' - ar;
ploterrhist(e,'bins',20);
figure
plotregression(testTarget‟,ar,'regression');
% GRNN classification
clc
Ref. code: 25605722300067ONR
86
tic
format compact;
[cfruits,ctxt,craw] = xlsread('centralMatlabBMSP.xlsx');
% extract sensor responses from the complete dataset „cfruits'
TGS2612 = cfruits(:,2);
TGS821 = cfruits(:,3);
TGS822 = cfruits(:,4);
TGS813 = cfruits(:,5);
TGS2602 = cfruits(:,6);
TGS2603 = cfruits(:,7);
TGS2620 = cfruits(:,8);
TGS2610 = cfruits(:,9);
S = [TGS2612,TGS821,TGS822,TGS813,TGS2602,TGS2603,TGS2620,TGS2610];
% random index matrix to select train and test data randomly
rIndex = [randsample(1:20,20)';randsample(21:40,20)';randsample(41:60,20)'; ...
randsample(61:80,20)';randsample(81:100,20)';randsample(101:120,20)'; ...
randsample(121:140,20)';randsample(141:160,20)';randsample(161:180,20)'; ...
randsample(181:200,20)';randsample(201:220,20)';randsample(221:240,20)'];
% indices of training data
trainIndex =
[rIndex(20*0+1:14+20*0);rIndex(20*1+1:14+20*1,:);rIndex(20*2+1:14+20*2,:);...
rIndex(20*3+1:14+20*3,:);rIndex(20*4+1:14+20*4,:);rIndex(20*5+1:14+20*5,:);...
rIndex(20*6+1:14+20*6,:);rIndex(20*7+1:14+20*7,:);rIndex(20*8+1:14+20*8,:);...
rIndex(20*9+1:14+20*9,:);rIndex(20*10+1:14+20*10,:);rIndex(20*11+1:14+20*11,:
)];
% indices of validation data
valIndex = [rIndex(20*0+15:(20*1-3));rIndex(20*1+15:(20*2-
3),:);rIndex(20*2+15:(20*3-3),:);...
rIndex(20*3+15:(20*4-3),:);rIndex(20*4+15:(20*5-3),:);rIndex(20*5+15:(20*6-
3),:);...
rIndex(20*6+15:(20*7-3),:);rIndex(20*7+15:(20*8-3),:);rIndex(20*8+15:(20*9-
3),:);...
Ref. code: 25605722300067ONR
87
rIndex(20*9+15:(20*10-3),:);rIndex(20*10+15:(20*11-
3),:);rIndex(20*11+15:(20*12-3),:)];
% indices of test data
testIndex =
[rIndex(20*0+18:20*1);rIndex(20*1+18:20*2,:);rIndex(20*2+18:20*3,:);...
rIndex(20*3+18:20*4,:);rIndex(20*4+18:20*5,:);rIndex(20*5+18:20*6,:);...
rIndex(20*6+18:20*7,:);rIndex(20*7+18:20*8,:);rIndex(20*8+18:20*9,:);...
rIndex(20*9+18:20*10,:);rIndex(20*10+18:20*11,:);rIndex(20*11+18:20*12,:)];
trainData = S(trainIndex,:); % training dataset of the sensor
trainTarget = cfruits(trainIndex,10); % prepare target matrix
valData = S(valIndex,:); % select data from full sensor data matrix
valTarget = cfruits(valIndex,10);
testData = S(testIndex,:); % select data from full sensor data matrix
testTarget = cfruits(testIndex,10); % targets of test data
tempError = 0; % initialize error
valError = size(valData);
gSig = 0.01;% initialize sigma
optSig = gSig;
while (valError > 1)
valError = 0;
grnnModel = newgrnn(trainData',trainTarget',gSig); % each experiment to each
column, each target to each column
valClass = round(grnnModel(valData')); % output class
for i=1:length(valData); % calculate errors
if valTarget(i)~=valClass(i)
valError = valError+1;
end
end
gSig = gSig + 0.01;
end
grnnTraintime = toc; % Training time count end
Ref. code: 25605722300067ONR
88
grnnModel = newgrnn(trainData',trainTarget',optSig); % each experiment to each
column, each target to each column
tic % testing time count begin
oClass = grnnModel(testData'); % output class
detectClass = round(oClass)'; % class value rounded
error = 0;
for i=1:length(testData); % calculate errors
if testTarget(i)~=detectClass(i)
error = error+1;
end
end
grnnTesttime = toc;
% Display results
disp(['Number of errors = ', num2str(error)]);
disp(['GRNN train time = ', num2str(grnnTraintime)]);
disp(['GRNN test time = ', num2str(grnnTesttime)]);
% RBFNN classification
clc
tic
format compact;
% import dataset
[cfruits,ctxt,craw] = xlsread('centralMatlabBMSP.xlsx');
TGS2612 = cfruits(:,2);
TGS821 = cfruits(:,3);
TGS822 = cfruits(:,4);
TGS813 = cfruits(:,5);
TGS2602 = cfruits(:,6);
TGS2603 = cfruits(:,7);
TGS2620 = cfruits(:,8);
TGS2610 = cfruits(:,9);
% Prepare training, validation, and test datasets
Ref. code: 25605722300067ONR
89
S = [TGS2612,TGS821,TGS822,TGS813,TGS2602,TGS2603,TGS2620,TGS2610];
rIndex = [randsample(1:20,20)';randsample(21:40,20)';randsample(41:60,20)'; ...
randsample(61:80,20)';randsample(81:100,20)';randsample(101:120,20)'; ...
randsample(121:140,20)';randsample(141:160,20)';randsample(161:180,20)'; ...
randsample(181:200,20)';randsample(201:220,20)';randsample(221:240,20)'];
trainIndex =
[rIndex(20*0+1:14+20*0);rIndex(20*1+1:14+20*1,:);rIndex(20*2+1:14+20*2,:);...
rIndex(20*3+1:14+20*3,:);rIndex(20*4+1:14+20*4,:);rIndex(20*5+1:14+20*5,:);...
rIndex(20*6+1:14+20*6,:);rIndex(20*7+1:14+20*7,:);rIndex(20*8+1:14+20*8,:);...
rIndex(20*9+1:14+20*9,:);rIndex(20*10+1:14+20*10,:);rIndex(20*11+1:14+20*11,:
)];
valIndex = [rIndex(20*0+15:(20*1-3));rIndex(20*1+15:(20*2-
3),:);rIndex(20*2+15:(20*3-3),:);...
rIndex(20*3+15:(20*4-3),:);rIndex(20*4+15:(20*5-3),:);rIndex(20*5+15:(20*6-
3),:);...
rIndex(20*6+15:(20*7-3),:);rIndex(20*7+15:(20*8-3),:);rIndex(20*8+15:(20*9-
3),:);...
rIndex(20*9+15:(20*10-3),:);rIndex(20*10+15:(20*11-
3),:);rIndex(20*11+15:(20*12-3),:)];
testIndex =
[rIndex(20*0+18:20*1);rIndex(20*1+18:20*2,:);rIndex(20*2+18:20*3,:);...
rIndex(20*3+18:20*4,:);rIndex(20*4+18:20*5,:);rIndex(20*5+18:20*6,:);...
rIndex(20*6+18:20*7,:);rIndex(20*7+18:20*8,:);rIndex(20*8+18:20*9,:);...
rIndex(20*9+18:20*10,:);rIndex(20*10+18:20*11,:);rIndex(20*11+18:20*12,:)];
trainData = S(trainIndex,:);
trainTarget = cfruits(trainIndex,10);
valData = S(valIndex,:);
valTarget = cfruits(valIndex,10);
testData = S(testIndex,:);
testTarget = cfruits(testIndex,10);
% calculate phi matrix
sqrError = 20;
Ref. code: 25605722300067ONR
90
errorThr = 0;
k = 0;
gma = 0.5; % Spreading factor 7919
phiMatrix = zeros(length(trainTarget),length(trainTarget));
while sqrError > errorThr
for i = 1:size(trainData,1)
for j = 1:size(trainData,1)
phiMatrix(i,j) = exp(-gma*norm(trainData(i,:)-trainData(j,:)));
end
end
W = (phiMatrix\trainTarget);
% calculate error
hx = trainTarget'*W;
sqrError = norm(hx-trainTarget);
if sqrError > errorThr
gma = gma+0.01;
else
break
end
k = k+1;
if k == 90 % number of epochs
break;
end
end
RBFtrainTime = toc
% Classify test data
tic
phiMatrixd = zeros(size(testData,1),size(trainData,1)); % generate phi matrix
for i = 1:size(testData,1)
for j = 1:size(trainData,1)
phiMatrixd(i,j) = exp(-gma*norm(testData(i,:)-trainData(j,:)));
end
Ref. code: 25605722300067ONR
91
end
hxd = phiMatrixd*W
hxdr = round(hxd)
RBFtestTime = toc;
disp(['RBF training time: ', num2str(RBFtrainTime)]);
disp(['RBF testing time: ', num2str(RBFtestTime)]);
% Count RBF errors
RBFerror = 0;
for i=1:length(testData);
if testTarget(i)~=hxdr(i)
RBFerror = RBFerror+1;
end
end
disp(['RBF errors : ', num2str(RBFerror)]);
% LDA classification
clc
tic
format compact;
% import dataset
[cfruits,ctxt,craw] = xlsread('centralMatlabBMSP.xlsx');
% prepare traing and testing data
TGS2612 = cfruits(:,2);
TGS821 = cfruits(:,3);
TGS822 = cfruits(:,4);
TGS813 = cfruits(:,5);
TGS2602 = cfruits(:,6);
TGS2603 = cfruits(:,7);
TGS2620 = cfruits(:,8);
TGS2610 = cfruits(:,9);
S = [TGS2612,TGS821,TGS822,TGS813,TGS2602,TGS2603,TGS2620,TGS2610];
Ref. code: 25605722300067ONR
92
rIndex = [randsample(1:20,20)';randsample(21:40,20)';randsample(41:60,20)'; ...
randsample(61:80,20)';randsample(81:100,20)';randsample(101:120,20)'; ...
randsample(121:140,20)';randsample(141:160,20)';randsample(161:180,20)'; ...
randsample(181:200,20)';randsample(201:220,20)';randsample(221:240,20)'];
% training with two classes
trainIndex = [rIndex(20*7+1:20*7+14,:);...
% rIndex(20*4+1:14+20*4,:);...
rIndex(20*10+1:20*10+14,:)];
% testing with three classes to identify false classification performance
testIndex = [rIndex(20*4+1:20*5,:);...
rIndex(20*7+15:20*8,:);...
rIndex(20*10+15:20*11,:)];
trainData = S(trainIndex,:);
trainTarget = [ones(14,1);ones(14,1)*2];%cfruits(trainIndex,10);
testData = S(testIndex,:);
testTarget = [ones(20,1)*3;ones(6,1);ones(6,1)*2];
% training an LDA
LinClassifier = fitcdiscr(trainData,trainTarget);
% First retrieve the coefficients for the linear
% boundary between the second and third classes
wK12 = LinClassifier.Coeffs(1,2).Const;
wL12 = LinClassifier.Coeffs(1,2).Linear;
% Plot the curve K + [x1,x2]*L = 0:
f = @(x1,x2) wK12 + wL12(1)*x1 + wL12(2)*x2;
h3 = ezplot(f,[0 3.5 0.5 3.5]);
h3.Color = 'k';
h3.LineWidth = 2;
hold on
plot(trainData(1:14,3),trainData(1:14,4),'ko')
hold on
plot(trainData(15:28,3),trainData(15:28,4),'ks')
xlabel('TGS 2602')
Ref. code: 25605722300067ONR
93
ylabel('TGS 813')
title('{\bf Fisher Linear Classification}')
% Test data classification
LDAtrainTime = toc;
tic
g = zeros(size(testData,1),2);
falseDetection = 0;
LDAerrors = 0;
for i=1:size(testData,1)
g12(i) = wL12'*(testData(i) -
(mean(trainData(1:14,:))+mean(trainData(15:28,:)))/2)';
if g12(i)>=0
g(i,1)=g(i,1)+1;
else g(i,2)=g(i,2)+2;
end
% count false detections and misclassification errors
if (i <= 20) && ((g(i,1) == 1) || (g(i,2) == 2))
falseDetection = falseDetection+1;
elseif (i > 20) && (g(i,1) ~= 1) && (i<=26)
LDAerrors = LDAerrors + 1;
elseif (i > 26) && (g(i,2) ~= 2)
LDAerrors = LDAerrors + 1;
end
end
disp(['LDA training time:', num2str(LDAtrainTime)])
LDAclassificationTime = toc;
disp(['LDA testing time :', num2str(LDAclassificationTime)]);
disp(['LDA false classification errors:',num2str(falseDetection)]);
disp(['LDA misclassification errors :',num2str(LDAerrors)]);
% MMM classification
% clear all
Ref. code: 25605722300067ONR
94
format compact;
tic
% import dataset
[cfruits,ctxt,craw] = xlsread('centralMatlabBMSP.xlsx');
% prepare training and test datasets
TGS2612 = cfruits(:,2);
TGS821 = cfruits(:,3);
TGS822 = cfruits(:,4);
TGS813 = cfruits(:,5);
TGS2602 = cfruits(:,6);
TGS2603 = cfruits(:,7);
TGS2620 = cfruits(:,8);
TGS2610 = cfruits(:,9);
S =
[TGS2612,TGS821,TGS822,TGS813,TGS2602,TGS2603,TGS2620,TGS2610,cfruits
(:,10)];
%% This section should be activated for outlier exclusion
ammend = 0;
nofClass = 12;
% indxAmmend = 0;
skp = [4 5 6]; % skip three classes for false classification testing
for i = 1:nofClass
if ismember(i,skp)
continue
else
X = S(20*(i-1)+1:20*i,1:9);
pDistM = squareform(pdist(X(:,1:8)))%sqrt(sum(X.^2,2));
distM = sum(pDistM,2)/19; % the diagonal elements are zeros
indices = distM<(mean(distM)+3*std(distM)); % find smaller elements
remainingPoints = X(indices,:); % exclude outliers if needed
classSizes(i) = size(remainingPoints,1);
newS(ammend+1:ammend+size(remainingPoints,1),:) = remainingPoints;
Ref. code: 25605722300067ONR
95
recordIndices(:,i) = indices;
% indxAmmend = indxAmmend + size(indices,1);
ammend = ammend + size(remainingPoints,1);
end
end
%% training and test data
rIndex = [randsample(1:19,19)';randsample(20:38,19)';randsample(39:58,20)'; ...
randsample(59:77,19)';randsample(78:97,20)';randsample(98:117,20)'; ...
randsample(118:136,19)';randsample(137:155,19)';randsample(156:175,20)'];
trainIndex = [rIndex(1:13,:);rIndex(20:32,:);rIndex(39:52,:);...
rIndex(59:71,:);rIndex(78:91,:);rIndex(98:111,:);...
rIndex(118:130,:);rIndex(137:149,:);rIndex(156:169,:)];
testIndex = [rIndex(14:19,:);rIndex(33:38,:);rIndex(53:58,:);...
rIndex(72:77,:);rIndex(92:97,:);rIndex(112:117,:);...
rIndex(131:136,:);rIndex(150:155,:);rIndex(170:175,:)];
trainData = newS(trainIndex,1:8);
trainTarget = newS(trainIndex,9);
testData = newS(testIndex,1:8);
testTarget = newS(testIndex,9);
%% find the centers i.e. mean vectors, maximum, and minimum matrices
nofClass = 9;
nDimen = size(newS,2)-1
classMeans = zeros(nofClass,nDimen);
classMins = zeros(nofClass,nDimen);
classMaxs = zeros(nofClass,nDimen);
for m = 1:nofClass
switch m
case 1
clStart = 1; clEnd = 13;
case 2
clStart = 20; clEnd = 32;
case 3
Ref. code: 25605722300067ONR
96
clStart = 39; clEnd = 51;
case 4
clStart = 59; clEnd = 71;
case 5
clStart = 78; clEnd = 90;
case 6
clStart = 98; clEnd = 110;
case 7
clStart = 118; clEnd = 130;
case 8
clStart = 137; clEnd = 149;
case 9
clStart = 156; clEnd = 168;
otherwise disp('Error in number of classes.');
end
classMeans(m,:) = mean(newS(clStart:clEnd,1:8));%finding mean vectors
classMins(m,:) = min(newS(clStart:clEnd,1:8)); % minimum vectors
%./0.9; % maximum vectors
classMaxs(m,:) = max(newS(clStart:clEnd,1:8))+std(newS(clStart:clEnd,1:8))*1.1;
end
% find the centers i.e. mean vectors, maximum, and minimum matrices ends %
trainTime = toc;
disp(['MMM train time:',num2str(trainTime)]);
%% Classification of test data by max min boundary begins here %
tic
newData = testData;%S(1:240,:);
newdataClass = zeros(nofClass,size(newData,1));
for i = 1:size(newData,1);
for j = 1:nofClass % sensor index
maxSum = sum(newData(i,:) <= classMaxs(j,:)); minSum = sum(newData(i,:)
>= classMins(j,:));
if maxSum >= 8 && minSum >= 8
Ref. code: 25605722300067ONR
97
newdataClass(j,i) = j; % first index is class, & i is test data index
else newdataClass(j,i) = 0;
end
end
end
%% Break ties
for i = 1:size(newdataClass,2)
tie = find(newdataClass(:,i),9);
if tie > 1
[row,col,val] = find(newdataClass(:,i)); % find tie classes
distance = zeros(size(val,1));
for j = 1:size(val) % size(val) gives number of non zero elements
distance(j) = norm(classMaxs(val(j),:)-newData(i,:));
end
[minDist,index] = min(distance);
for j = 1:size(val) % break ties
if (minDist < distance(j))
newdataClass(row(j)) = 0;
end
end
end
end
%% calculate errors
misClassError = 0;
falseError = 0;
for i = 1:size(newdataClass,1)
for j = 6*(i-1)+1:6*i
if newdataClass(i,j) == 0
misClassError = misClassError + 1;
elseif newdataClass(i,j) ~= i
falseError = falseError + 1;
end
Ref. code: 25605722300067ONR
98
end
end
testTime = toc;
disp(['MMM test time:',num2str(testTime)]);
disp(['MMM mis classification errors :',num2str(misClassError)]);
disp(['MMM false classification errors:',num2str(falseError)]);