Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
DOCTORAL THESIS
Classification of materials through the integration of spectral and spatial features
from hyperspectral data.
Submitted by:
Mr. Artzai Picón Ruiz
In fulfillment of the degree of Doctor granted by the University of the Basque Country.
Directed by:
Dr. Pedro Mª Iriondo Bengoa.
Bilbao, 2008.
Page i
DOCTORAL THESIS
Classification of materials through the integration of spectral and spatial features
from hyperspectral data.
Submitted by:
Mr. Artzai Picón Ruiz
In fulfillment of the degree of Doctor granted by the University of the Basque Country.
Directed by:
Dr. Pedro Mª Iriondo Bengoa.
Bilbao, 2008
Page ii
Acknowledgments:
The compilation and drafting of a Doctoral Thesis is a personal journey that is begun with great joy and uncertainty, not exempt of difficulties and which can not be undertaken alone. Though not each and every one of those people who have actively or passively contributed to the development of this Thesis can be included in this section, I would certainly like to express my gratitude to those who have been key in its undertaking. First, I would like to thank the Director of this Thesis, Doctor Pedro Mª Iriondo Bengoa, not only for all his support and help in its drafting, but also for his friendship and for having been the driving force, together with my school Physics Professor, Bro. Mariano Gutiérrez and my father, who instilled my preference for science and research. Second, I would like to express my gratitude to the members of CIPA (Centre for Image Processing and Analysis) of the Dublin City University, and specially to its Chair, Professor Paul F. Whelan who allowed me to do a one year research stay in the centre and to Doctor Ovidiu Ghita for all his invaluable help. I would also like to thank the considerable support and commitment by Tecnalia Corporación Tecnológica and specifically, to the director of the Tecnalia-Infotech unit, Mrs. Ana Ayerbe Fernández-Cuesta and its technical director Mrs. Silvia Rentería for all their support, both logistic as well as personal, provided in the undertaking of this Thesis. At the same time, I wish to express my gratitude to the members of the consortium of the European project SORMEN, in which I also contribute as a researcher, for granting me the right to use these hyperspectral images used for the experimental validation of these concepts and methodologies presented in this Thesis. I also want to thank the ETORTEK programme of The Basque Government for the economic funds granted for the completion of the research stay in CIPA and for the drafting of this Thesis. Additionally, I would like to thank my departmental colleagues (Tecnalia-Infotech), as well as my colleagues in Dublin, for their backing and support provided during the drafting of this Thesis. Last, I would like to thank my family and friends, and specially my parents and brother for all their support throughout my life and for having made me who I am, and to Sonia, for her considerable support and understanding in the last stages of the undertaking of this Thesis and because beside her, problems seem less.
Page iii
Abstract -Images taken in grey levels and those based on the standard representation of colour
RGB extract diverse properties of different objects or materials in them, thus making their visual
separation possible through the use of image processing techniques. However, at times, these
materials bear certain similarities in their appearance, shape and/or colour that make their visual
classification unfeasible. Counterpoint to this, hyperspectral images provide broad information
about the luminous spectrum reflected in each of the elements of the image. This characterises
their molecular properties in order to define more elaborate models that will provide greater
precision in the classification. Despite these advantages, small variations in the chemical
composition and/or the high variability between materials belonging to the same class, at times,
make the obtaining of a robust classification through the use of spectral features in a simplistic
manner impossible.
In order to provide a solution to this problem, this work sets out a methodology which allows, in
the first place, the optimal reduction of high spectral dimensionality through the construction of
spectral fuzzy sets which are bioinspired in the functioning of the cones of the human visual
system. These fuzzy sets minimise the redundant information that exists between adjacent bands
of the spectrum, thus maximising its discriminating power in a similar manner to that which
would be done by a "multispectral eye". Additionally, the spectral and spatial features of the
elements of the image are integrated, which make possible the obtaining of a combined descriptor
which in a more precise manner characterises the properties of the elements contained in that
image. The theoretical model for the classification has been validated through the use of samples
from materials for recycling from waste electrical and electronic equipment (WEEE). The
obtained results show an increase in the classification rate from 44%, by only using colour
information, to 56% via the use of spectral information through classic methods, up to 98%,
through the extraction and integration of the proposed spectral-spatial features.
Key Words - Hyperspectral image processing, image segmentation, image classification,
integration of spectral-spatial data, image processing, classification of materials, recycling,
bioinspired systems.
Page iv
Resumen— Las imágenes tomadas en niveles de gris y las basadas en la representación de color
estándar RGB permiten extraer diversas propiedades de los distintos objetos o materiales
presentes en las mismas, haciendo posible su separación visual utilizando técnicas de tratamiento
de imagen. Sin embargo, en ocasiones estos materiales presentan ciertas similitudes en su
apariencia, forma y/o color que hacen inviable la clasificación visual de los mismos. En
contraposición con ello, las imágenes hiperespectrales ofrecen información extendida acerca del
espectro lumínico reflejado en cada uno de los elementos de la imagen, lo que permite
caracterizar las propiedades moleculares de los mismos para así poder definir modelos más
elaborados que posibilitan una mayor precisión en la clasificación. A pesar de estas ventajas, las
pequeñas variaciones en la composición química y/o la alta variabilidad entre los materiales
pertenecientes a una misma clase hacen que, en ocasiones, sea imposible la utilización de las
características espectrales de manera simplista para obtener una clasificación robusta.
Con el fin de solventar dicha problemática, este trabajo presenta una metodología que permite, en
primer lugar, la reducción óptima de la alta dimensionalidad espectral mediante la construcción
de conjuntos difusos espectrales bioinspirados en el funcionamiento de los conos del sistema
visual humano. Estos conjuntos difusos minimizan la información redundante existente entre las
bandas adyacentes del espectro, maximizando el poder discriminativo del mismo, de forma
similar a como lo haría un “ojo multiespectral”. De forma adicional, se integran las características
espectrales y espaciales de los elementos de la imagen, lo que permite obtener un descriptor
combinado que caracteriza de manera más precisa las propiedades de los elementos contenidos en
la imagen. El modelo teórico para la clasificación se ha validado mediante la utilización de
muestras de materiales procedentes de residuos de equipamiento eléctrico y electrónico (WEEE)
para reciclaje. Los resultados obtenidos revelan un incremento en la tasa de clasificación desde
un 44%, utilizando solamente información de color, y un 56%, mediante la utilización de la
información espectral por métodos clásicos, hasta un 98%, a través de la extracción e integración
de las características espectro-espaciales propuesta.
Palabras clave —Procesamiento de imagen hiperespectral, segmentación de imagen,
clasificación de imágenes, integración de datos espectro-espaciales, procesamiento de imagen,
clasificación de materiales, reciclaje, sistemas bioinspirados.
Page v
Laburpena—Gris-mailetan zein RGB kolore-estandarreko irudikapenean hartutako irudietatik
irudi-tratamenduko teknikak erabilita posiblea da agertzen diren objektu edo materialen banaketa
(segmentazioa prozesua) burutzea eta horietatik zenbait ezaugarri atera ahal izatea. Hala ere,
kasu batzuetan material horietan itxuran, eran edo/eta kolorean ematen diren antzekotasunak
ezinezkoa egiten dute bere ikus-sailkapena. Aldiz, irudi hiperespektralak eskaintzen dute
informazio zehatzagoa, irudi hiperespektraleko osagai bakoitzean argi-espektroari buruzkoa
informazioa irudikatzen da ezaugarri molekularraren karakterizazioa ahalbideratuz. Informazio
honetan oinarritutako eredu landuak erabiliz sailkapen zehatzagoak lor daitezke. Ala ere, kasu
batzuetan, konposizio kimikoan ematen diren aldakuntza txikiak eta/edo klase bereko materialen
arteko aldakortasun handiak ezinezkoa egiten dute sailkapen sendo bat lortzea ezaugarri
espektralak era sinple batean erabiliz.
Arazo hori konpontzeko, lan honek metodologia bat aurkezten du. Alde batetik,
dimentsionaltasun espektral handiaren murrizketa hoberena lortzen da espektro-multzo
barreiatuak eraikiz, gizakien ikusizko sistemaren konoen funtzionamenduan oinarrituta. Multzo
hauek espektroaren auzokide banden arteko informazio erredundantea edo behargabekoa
minimizatzen dute eta espektroaren ahalera diskriminatzailea maximizatzen dute, “espektro-
anitzeko begi bat” balitz bezala. Bestalde batetik, espektro eta espazioko ezaugarriak integratzen
dira deskribatzaile konbinatu bat lortuz, zeinek irudian agertzen diren elementuen propietateak
era zehatzago batean definitzen ditu. Sailkapenerako asmatu den eredu teorikoa egiaztatu da
birziklapenerako hornikuntza elektriko-elektronikoaren (WEEE) hondakinetatik eratorritako
material-laginak sailkatuz. Lortutako emaitzak gehikuntza nabarmen bat erakusten dute
sailkapen-tasan, %44tik, kolore-informazioa soilik erabiliz, eta %56tik, metodo klasikoen bidezko
informazio espektrala erabiliz, %98raino, proposatutako espektro-espazial ereduan oinarrituz.
Gako-hitzak − Irudi hiperespektralen prozesamendua, irudien segmentazioa, irudien sailkapena,
espektro-espazial datuen integrazioa, irudien prozesamendua, materialen sailkapena, birziklatzea,
bio-inspiratutako sistemak.
Page vi
Index
Chapter I: Introduction.................................................................................................... 1
1. Aim of the Thesis.................................................................................................... 4
2. Content of the Thesis .............................................................................................. 6
Chapter II: Hyperspectral images for the classification of materials .......................... 9
1. Colour Theory....................................................................................................... 12
2. Acquisition and representation of hyperspectral images ...................................... 22
3. Issues on the application of hyperspectral images for the classification of
materials ........................................................................................................................ 26
4. Conclusion ............................................................................................................ 28
Chapter III: Classification methods.............................................................................. 31
1. Classical metrics ................................................................................................... 33
2. Classifiers.............................................................................................................. 39
2.1. Similarity classifiers.......................................................................................... 42
2.1.1. Nearest Neighbour 1-NN. ............................................................................. 43
2.1.2. Nearest mean................................................................................................. 44
2.1.3. Vector Quantization VQ. .............................................................................. 45
2.2. Statistical classifiers.......................................................................................... 51
2.2.1. Probability Theory ........................................................................................ 51
2.2.2. Parametric methods....................................................................................... 55
2.2.2.1. Gaussian distribution .................................................................................... 55
2.2.2.2. Estimate of the parameters for a Gaussian distribution from N observations.. 57
2.2.2.3. Application of Bayes' classifier to Gaussian distributions............................ 59
2.2.2.4. Gaussian mixture .......................................................................................... 61
2.2.2.4.1. Estimate of parameters of a distribution of Gaussian mixture from N observations 64
2.2.3. Non-parametric methods............................................................................... 66
2.2.3.1. Partition models on histograms..................................................................... 67
2.2.3.2. K-nearest neighbour...................................................................................... 68
2.3. Classifiers based on the calculation of boundaries of decision......................... 70
2.3.1. Perceptron ..................................................................................................... 71
2.3.2. Multilayer perceptron.................................................................................... 74
Page vii
2.4. Combination of classifiers ................................................................................ 78
3. Conclusions............................................................................................................... 79
Chapter IV: Feature extraction in hyperspectral vectors........................................... 81
1. Feature extraction...................................................................................................... 85
2. Feature selection ....................................................................................................... 95
2.1. Automatic feature selection. ............................................................................. 96
2.2. Selection and extraction of known discriminat features. .................................. 98
3. Conclusions............................................................................................................. 100
Chapter V: Extraction of spectral features based on fuzzy sets bioinspired in the
human visual system ..................................................................................................... 101
1. Definition of fuzzy sets........................................................................................... 103
2. Spectral fuzzy sets................................................................................................... 105
3. Multi-frequency spectral fuzzy sets ........................................................................ 113
4. Conclusions............................................................................................................. 115
Chapter VI: Integration of spectral and spatial features.......................................... 119
1. Fuzzy spatial histograms......................................................................................... 123
1.1. Improvement in the quantization of the histogram. ........................................ 124
1.2. Definition of the fuzzy neighbourhood histogram.......................................... 128
1.3. Definition of the fuzzy region histogram........................................................ 130
2. Extension of the fuzzy spatial histograms to vectorial features. Spectral-spatial
histograms ....................................................................................................................... 132
2.1. Quantization of the feature vector................................................................... 134
2.2. Definition of fuzzy vectorial histograms. ....................................................... 135
3. Conclusions............................................................................................................. 138
Chapter VII: Classification of spectral images and region analysis ........................ 141
1. General description of the process of classification ............................................... 142
2. Classification of hyperspectral images ................................................................... 145
2.1. Acquisition of an image and the correction of lighting. ................................. 146
2.2. Independence from the lighting source........................................................... 147
2.2.1. Independence from the geometric coefficient............................................. 150
2.3. Decorrelation of the luminous spectrum......................................................... 153
Page viii
2.3.1. RGB. ........................................................................................................... 154
2.3.2. RAW. .......................................................................................................... 155
2.3.3. Principal Component Analysis (PCA). ....................................................... 156
2.3.4. Fisher's linear discriminant. ........................................................................ 157
2.3.5. Spectral fuzzy sets....................................................................................... 157
2.4. Integration of spectral-spatial features............................................................ 158
2.5. Classification Procedure. ................................................................................ 159
3. Analysis and merging of regions. ........................................................................... 161
3.1. Region of maximum likelihood. ..................................................................... 163
3.2. Normalised region histogram.......................................................................... 164
4. Conclusions............................................................................................................. 166
Chapter VIII: Results ................................................................................................... 169
1. Description of the data sample................................................................................ 170
2. Background identification....................................................................................... 174
3. Influence of the correction of lighting for the classification of materials............... 178
4. Decorrelation of the luminous spectrum................................................................. 179
5. Integration of spectral and spatial features ............................................................. 183
6. Methods of region merging..................................................................................... 187
7. Conclusions............................................................................................................. 190
Chapter IX: Conclusions, contributions and future work ........................................ 193
1. Conclusions............................................................................................................. 194
2. Contributions........................................................................................................... 197
3. Future work............................................................................................................. 201
References ...................................................................................................................... 203
Page 1
Chapter I
Introduction
Chapter I Introduction
Page 2
Nowadays, sustainable development has become one of the most important goals of modern
societies. In present day society, a great quantity of materials in their most diverse forms and
varieties are fabricated, used and discarded, thus generating great quantities of waste. Numerous
times, these waste are not biodegradable or, even, can have high levels of toxicity. Despite
institutional efforts, many of these wastes can not be efficiently sorted, and as a consequence, are
deposited in traditional landfills without undergoing any type of recycling. Therefore, the
scientific community has the task of providing solutions and alternatives that allow its correct
sorting.
Within the great variety of existing waste, it is worth mentioning the importance of waste from
electrical and electronic equipment (WEEE). These wastes, which come from very varied
products, are made up from a great variety of materials which have great potential for recycling.
This fact has made the process for the recovery of these wastes to be one of the most complex
industrial tasks.
In these last years, due to environmental legislation that has been passed in relation with the
recycling process of WEEE, finding new solutions for the recovery of these materials has become
an obligation. To set an example, the Directive of the European Commission regarding the
recycling of waste from electrical and electronic equipment (WEEE) [CE_02] establishes that
Member States shall recover between 70-80% of the weight of the waste produced as well as shall
re-use between 50-70% of the recovered materials and components. This regulation enhances the
need to devote greater efforts to the development of new techniques and technologies capable of
improving the performance of applied methods for the sorting of waste.
In order to achieve the proposed objectives, the development of automated systems for the
recycling of electrical and electronic waste becomes an economic and efficient alternative in the
recycling of these wastes.
Specifically, the current process of recycling of WEEE exposes this electronic waste to processes
of crushing and both mechanical as well as densimetric sorting. However, the fractions which are
a result of this separation still contain a mix of non-ferrous materials (for example, aluminium,
copper, zinc, brass or lead) and austenitic stainless steel, which represent 13% of the total scrap
from WEEE. Along these lines, it is important to highlight that this mix can not be sorted using
Chapter I Introduction
Page 3
current recycling methods [SORM_06] [BERE_07]. Nonetheless, the toxicity of some of these
materials makes the finding of solutions for its correct sorting and recycling even more critical.
Figure. 1. Set of non-ferrous materials from electronic waste.
The methods traditionally employed for sorting these materials involve the use of visual
inspection by highly-skilled operators [SPEN_05]. Based on these methods, Kutila et al
[KUTI_05] developed a colour system of inspection which was applied in the sorting of metals
whose colours were predominantly reddish from other more shiny metals, such as aluminium and
zinc. The results obtained indicated that those materials that were defined by reddish properties
could be separated from shiny materials. However, inadequate results were obtained when trying
to sort materials of the same colour group.
On the other hand, methods based on X-rays have been widely used for the sorting of metallic
scrap and the separation of plastic [SOMM_06], [WAHA_06]. However, they are not suited for
the sorting of materials with similar properties due to the fact that they only measure the density
of the material [SPEN_05].
In fact, the only methods which have obtained a greater efficiency in the sorting of metals are
those based on the detection and analysis of their emitted spectrum when subject to thermal
excitement. However, these approaches can not be used for recycling processes due to both the
slow rate of the process of acquisition of the spectrum and the technical difficulty entailed in
exciting each and every one of the fractions of scrap to be classified and sorted.
On its part, modern optical spectrometers provide a high resolution image with detailed spectral
information. This technology implies the acquisition and interpretation of multidimensional
Chapter I Introduction
Page 4
digital images. Current systems are capable of acquiring multiple bands, from ultraviolet to the
very long infrared, with a good resolution between bands. This versatility allows the application
of spectral image systems for the detection and sorting of several classes of materials, both
natural as well as man-made, such as: minerals, metals, plastic, vegetation, cell differentiation...
[WABA_06, CHANG_03, TSO_04, SPEC_08].
The main characteristic of hyperspectral images is that each pixel is defined by a vector whose
elements correspond to different spectral components (wavelengths) acquired from the scene. In
this manner, the hyperspectral vector not only provides colour information associated with the
scene, but also information related to the molecular behaviour of the materials present in that
scene [CHANG_03, GRAH_07, SLAT_99, HEAL_99], thus easing their sorting.
However, the high variability of these materials even makes the use of spectral information
insufficient for their robust identification and classification. This makes necessary the use of new
methods that will correctly characterise these materials for their correct recycling.
1. Aim of the Thesis
In order to resolve the earlier problem, one of the main aims of this Thesis consists in establishing
a complete framework for the classification of materials which bear great similarity and
variability. These properties cause traditional methods based in colour to not be capable of
correctly classifying these materials and, even, causes that the use of spectral information in order
to undertake this task does not obtain the desired results.
Nonetheless, the approach to resolve this problem is not solely focused on the analysis of the
spectral properties of materials and their optimal classification, but also in the development of a
generic theoretical framework that integrates spectral and spatial features contained in
hyperspectral images in a sole mathematical descriptor.
Through this generic integration in a sole descriptor (feature vector) of the spectral and spatial
properties of each element of an image, the aim is to improve the efficiency of the
characterisation, segmentation and classification of these images versus the isolated use of these
properties.
Chapter I Introduction
Page 5
In order to achieve an optimal spectral-spatial integration, further study is going to be undertaken
in the different existing spectral decorrelation techniques, thus o providing an appropriate
solution to the problem of feature extraction.
Additionally, using the generic spectral-spatial descriptor as a starting point, the aim is to create
an adequate mathematical model that will detect and separate different materials taking into
account their inherent variability.
Additionally, the aim is to integrate earlier models in a complete framework that allows the
acquisition, processing, detection and classification of materials in a robust and computationally
efficient manner in order to obtain a modular algorithm that is compatible with the existing
restrictions in an industrial application (robustness, tuning cost, speed). In this manner, the
resulting algorithm shall be able to be integrated in a simple manner in industrial sorting
processes.
Parallel to the undertaking of this Thesis, the European Project SORMEN Innovative Separation
Method for Non-Ferrous Metal Waste from Electric and Electronic Equipment (WEEE) based on
Multi- and Hyper- spectral Identification [SORM_06] is taking place. This project is developing
an industrial system for the sorting of electronic waste (WEEE). Given the great difficulty that
exists for the classification of these elements, the validation of the methodologies proposed in this
Thesis are going to be made through the classification of sets of electronic waste provided by
recycling companies belonging to the consortium in this project.
Chapter I Introduction
Page 6
2. Content of the Thesis
This Thesis is divided into nine chapters which detail the different aspects dealt with in this work.
After the introductory chapter, Chapter II describes the physical fundamentals and the
methodology employed for obtaining hyperspectral images, highlighting the advantages of the
use of this type of images for the classification of materials as opposed to traditional colour
techniques. At the same time, this chapter sets out in detail the traditional classification procedure
based on hyperspectral images and lists some of the limitations of the current methods which
must be perfected.
Chapter III sets out the current state of the art of existing classification techniques. Firstly,
classical metrics used for the quantification of the existing differences between two specific
spectra are shown, and, next, a description is provided about the different classification
techniques that create robust classification models.
Chapter IV sets out the existing problems for obtaining an adequate classifier while taking into
account the high dimensionality associated with luminous spectra. At the same time, it describes
the Hughes Phenomenon that is caused by this high dimensionality. In order to reduce its effect,
an analysis is made of current techniques for feature reduction that select or extract those which
permit an adequate reduction of data dimensionality and which maintain their discriminating
power while studying their limitations.
On the other hand, Chapter V looks at the limitations of current methods of feature extraction. It
provides a list of desirable properties that the optimal feature vectors should comply with, which,
in turn, will build an adequate descriptor of the luminous spectrum. Based on these properties, a
novel method is proposed for the extraction of spectral features through the fuzzification of the
spectrum (division of the latter based on fuzzy sets) and the use of the concept of the Energy
associated to each of the fuzzy sets defined in it. This approach extracts visual information in a
similar manner to that done by the human visual system.
Chapter I Introduction
Page 7
In turn, Chapter VI theoretically proposes and describes a methodology that unifies spectral
properties (luminous spectrum) and spatial (distribution of neighbour features) in a sole feature
vector.
Chapter VII provides a complete algorithm that allows the optimum classification of materials
through the use of hyperspectral images while taking into account the proposed methodologies in
Chapter V and VI. At the same time, this chapter defines the statistical model that permits for the
characterisation of each of the elements to be classified. Additionally, methodologies that reduce
the dependence on the chromaticity of the material versus external factors of lighting and
geometry are integrated in this algorithm, proposing diverse options that reduce this dependence
under non-ideal conditions and that, in turn, include a process that is capable of unifying those
erroneously classified regions.
On the other hand, Chapter VIII provides detailed results on the performance of the different
proposed methodologies, thus comparing the methods proposed in this Thesis with those that
belong to the state of the art. The experimental verification is undertaken on a set of tests that is
composed by electrical-electronic waste that has been previously selected and validated by
enterprises which participate in the SORMEN project [SORM_06].
Finally, Chapter IX contains the conclusions of the finalised work, providing a summarised
listing of the main contributions and setting out the future work after the completion of this
Thesis.
Chapter I Introduction
Page 8
Page 9
Chapter II
Hyperspectral images for the classification of materials
Chapter II Hyperspectral images for the classification of materials
Page 10
Sight is one of the most developed senses in the human being. Both humans, as well as animals,
are capable of interpreting their own environment through their perceived stimuli, having
developed a great capacity for visual interpretation that allows them to freely cope in the
environment which surrounds them. Of all the senses, sight is considered the most important
sense of those that are available to the human being, allowing a more precise interpretation of
his/her surroundings, just as has been set out by great geniuses throughout time:
"In adequate circumstances and at the appropriate distance, the eye is tricked less than any other sense, as, I will later
show, it sees through straight lines that form a pyramid, whose vertex points on the eye and its base rests in the object
being watched. Hearing, on the other hand, is often tricked regarding the place and distance of the source of the sound,
as these do not reach through straight lines, as those of the eyes, but rather through broken and tortuous waves, this,
often times, happens some distant voices seem nearer that those nearest, due to their trajectory; luckily, only the echo
travels in a straight line. Even with greater difficulty does the smell find the source of a perfume. Taste and touch, on
their part, must touch and object in order to know it."
Leonardo Da Vinci Treatise on Painting. 1651.
However, one becomes accustomed to visualising the world in the manner in which one does, not
being fully aware of the inherent limitations of the human eye. For example, one can not see
objects below a certain scale without the use of microscopes or lenses. Additionally, one can not
process visual information at high speeds.
These are not the only limitations of the human eye, when a ray of sun light goes through a prism,
this is divided into a rainbow pattern that begins in violet and varies progressively towards red. In
the boundaries of this rainbow, one can see that light disappears and falls into darkness. This
darkness is, in fact, apparent, as there is a luminous emission in the boundaries of the spectrum
even though the human eye is not capable of detecting it. Although the human being is capable of
receiving visual stimulus in a range of frequencies between 400 and 700 nanometers, other
animals are capable of sight in the ultraviolet region of the spectrum, thus perceiving the
environment in a manner different to the human being.
This can be explained due to the properties of light as an electromagnetic wave. Figure 2.1 shows
that solar irradiance as a function of the obtained wavelength of light is at its maximum in the
region between 400-700 nm, region which corresponds to the wavelengths that are detected by
the human eye. This adaptation to the solar luminous frequency allows human beings and animals
to comfortably and efficiently cope in their own environment.
Chapter II Hyperspectral images for the classification of materials
Page 11
Fig. 2.1 Spectrum of solar radiation.
However, the human eye is not capable of perceiving wavelengths that are not within this visible
band. Nonetheless, these non-visible wavelengths provide a great deal of information that would
allow us to have a more detailed vision of the environment and of the objects that surround us.
High frequencies, such as X-rays or gamma rays, provide information about the internal
structures of different objects, while infrared frequencies provide information on the molecular
interactions that exist in a specific material or about the temperature and/or heat flux of a specific
object. In this same manner, many materials that have a similar appearance in the visible range of
the spectrum can be distinguished in other ranges of the spectrum due to the fact that they have
luminous properties that are totally different in that region of the spectrum. Examples of this can
be found in the prestigious book Alien Vision [RICH_01], which compiles different perceptions
of our surroundings in different wavelengths.
This chapter, first theoretically sets out the process of image formation caused by the incidence of
a luminous ray on an object as well as the causes why a specific material has a characteristic
reflected luminous spectrum. In the same manner, an analysis is made on the way in which these
luminous spectra are perceived by the human visual system and how it is not capable of capturing
all the existing spectral information.
250 500 750 1000 1250 1500 1750 2000
Wavelength (nm)
Spectral Irradiance
visible
Chapter II Hyperspectral images for the classification of materials
Page 12
Next, the different technologies that allow the capturing of these luminous spectra and their
associated digital representation (Hyperspectral image) are set out in order to establish the
luminous spectrum associated with each point of the image.
Last, a description is provided of the process that is followed in order to classify materials
through techniques based on these hyperspectral images. Additionally, we show the different
difficulties that must be faced in order to optimise the information contained in these images that
shall provide a more precise characterisation of the materials/objects that they represent.
1. Colour Theory
This section aims to provide a general idea on the process of image generation and its physical
properties. Understanding this process allows us to show the causes why the luminous spectrum
that is reflected by a material can be an indication of its molecular properties. In order to do this,
a brief review of colour theory is made that establishes the theoretical basis that explains the
particular chromatic perception of a specific material.
The colour with which one perceives a specific object depends on several factors. However, not
all of them are dependent on the features of the object. In fact, colour, or the perceived chromatic
perception, depends not only on its chromatic properties, but also of the nature, intensity and
position of the incident light, of the geometry of the object as well as the position and features of
the observing element.
The first factor which influences colour or the perceived spectrum is determined by the nature of
the luminous source. Leonardo Da Vinci, in his Treatise on Painting [VIN_1651], noticed this
behaviour by establishing that the different positions of the sun caused changes in the chromatic
perception of objects, making reference to the blue hues of the morning shadows which tone to
warm nuances in the evening. In a similar manner, the lighting of a scene through the use of
different types of artificial lighting has different hues depending on the type of lighting used. In
this manner, halogen or incandescent lighting produce warmer hues than those by lighting which
is either fluorescent or based on white LEDs, which tone objects with a bluish colour. Based on
these affirmations, the nature and features of the luminous source directly influence the chromatic
perception of the observed object.
Chapter II Hyperspectral images for the classification of materials
Page 13
Another influential factor in the chromatic perception of an object is the geometric features of the
object. Its own geometry, as well as its reflective properties (specular object, diffuse, matte),
influence the chromatic perception of the object. However, these factors do not depend on its
molecular composition, but rather are due to the relative position between the incident luminous
focus and the geometry of the object. However, the aforementioned properties define the intensity
with which that colour is perceived.
In order to characterise an object based on its luminous properties, the most important property
corresponds with its chromaticity. Bearing in mind that it is not the material which possesses the
color, but rather, the molecular composition of the material is that which makes certain
wavelengths be reflected and absorbed in a different manner. The percentage of absorption or
reflection of each of the wavelengths of a material, known as chromaticity [TAB_04] or
reflectivity [OHNO_00] is solely going to depend on the molecular properties of the object, thus
allowing its identification based on its colour.
The third factor that influences the perception of colour is related with the sensory element which
receives that reflected lighting. In this sense, the capacities of the visual sensor of the observer
significantly influence the quality of the compiled information. In this manner, the human eye
uses a type of nervous ending known as cones that are capable of converting the reflected
luminous spectrum into colour information [SANG_98]. There are other types of nerve endings,
known as rods, which allow a colourless night vision and which do not allow to capture colour
information [SANG_98, GUNT_92].
This way, for the human being, the capacities of these photoreceptive cells are going to determine
which features of the reflected luminous spectrum are extracted and interpreted. Specifically, the
human being has three different types of photoreceptive cones. These types of cones, known as S,
L and M capture luminous information in different wavelengths, as can be seen in Fig. 2.2. By
having three types of sensors, the human eye is capable of perceiving colours made up by the
combination of each signal received by each of the types of colour receptors (cones) that are in it,
that is, by the combination of three basic colours taken as primary. Due to this, the different
colour systems that represent human vision: RGB, HSL, HSV, CIELab… [YOSH_00, SANG_98,
GUNT_82] are based on the combination of three basic components.
Chapter II Hyperspectral images for the classification of materials
Page 14
Fig. 2.2 Absorption frequencies of the different types of cones in the human visual system.
However, the luminous information perceived by the human eye does not encompass all the
information contained in the spectrum reflected by the material. Rather, it only contains the
aggregate information from the spectral response caused by the absorption of different cones. In
the same manner that a person with a dysfunction in his class M cones shall have difficulties in
order to distinguish between red and green hues (daltonism), the human eye does not capture all
the information contained in the reflected luminous spectrum, thus losing some of the information
that could be of interest for the characterisation of the perceived objects.
In conclusion, one can assert that the information contained in the reflected spectrum by a
material is not only related to its molecular properties, but it is also related to the nature of the
incident lighting, the geometric properties of the material and the capacities of the receptor
sensor.
In order to establish whether the information contained in the reflected spectrum of a material
offers greater information than the perceived information by the human visual system, we are
going to mathematically describe the aforementioned model of colour formation and compare the
information obtained by the human visual system with that which is contained in the reflectance
spectrum.
In order to do so, it is necessary to begin by describing the element which causes the existence of
the reflectance spectrum: Light. The discussion on the corpuscular or undulatory nature of light
S Cone
M Cone L Cone
Wavelength (nm)
400 500 600 700
Chapter II Hyperspectral images for the classification of materials
Page 15
comes from the XVIII century, in which Newton proposed the corpuscular theory of light based
on the rectilinear properties of its movement, reflection and behaviour when facing obstacles.
However, this theory could not explain either the absence of the loss of mass either when emitting
these corpuscles or the different behaviour of refraction and reflection of different corpuscles. At
that same period, Huygens proposed an undulatory theory of light based on the observation of
different phenomena. However, this theory could not explain the cause of the propagation of light
in a vacuum, which caused the theory of the existence of ether.
In 1855, Maxwell published his mathematical theory on electromagnetism which predicts the
existence of electromagnetic waves that propagate at the speed of light, the different
electromagnetic waves (light, radio, microwaves...) being considered of the same nature, but with
different frequency.
Einstein broadened the available knowledge on light by considering that it is composed of
particles known as photons. These photons, in theory, without any mass or electrical charge,
constitute indivisible packs of energy that are dependent on its frequency in accordance with the
equation:
ν·hE = (2.1)
Where h is Planck's constant andν the frequency associated with that photon.
Taking as a basis this model of light, which is made up of a set of photons associated to different
frequencies, one can define the electromagnetic spectrum of an object as the distribution of the
emitted, reflected or absorbed intensity of energy (depending on the type of spectrum) in a
selected range of wavelengths.
In this manner, an incident ray of light that emits energy in different frequencies is defined by its
incident emission spectrum ( iL ) that shows the intensity of this incident ray in each of the
associated wavelengths. Figure 2.3 shows a graphical representation of an emission spectrum.
Chapter II Hyperspectral images for the classification of materials
Page 16
Fig. 2.3 Graphical representation of a luminous spectrum.
First, we begin by providing an explanation on the simplest classical models that exist in order to
explain the phenomenon of the reflection of light. The simplest model for the formation of colour
is that known as the specular model [COOK_81]. In the specular model of reflection, the
luminous spectrum is reflected in a sole direction defined by the angle of reflection. This
phenomenon is caused by the different electromagnetic properties of the different mediums in
which the luminous wave travels, thus causing a change in the direction of the interface between
both elements.
Fig. 2.4 Specular model of reflection
In this model, not all lighting has to be reflected; therefore part of it can be transmitted through
the material in accordance with equation 2.2.
)()·()( λλλ ispecularr LCL = (2.2)
λ
Intensity
λi
L
i
i
L(λi)
Θi Θr Li Lr
Chapter II Hyperspectral images for the classification of materials
Page 17
Where )(λrL corresponds with the luminous intensity reflected for a given λ wavelength, and
)(λspecularC is the reflection coefficient for that given wavelength. This reflection coefficient is
usually considered not dependent on the wavelength, (neutral interface model [TOMI_94]), thus
having the same percentage of intensity irrespective of the wavelength. This way, the reflection
coefficient is converted into its scalar expression:
)(·)( λλ ispecularr LCL = (2.3)
This reflection model reliably represents the behaviour of polished elements with specular type
behaviour in which the light transmitted to the body of the object is directly reflected according to
the laws of reflection. In diffuse bodies, incident light returns to the initial element after
successive interactions in the inside of the object interacting with its colouring. In order to
represent this behaviour the Lambertian reflectance model is used [ANGE_99], to characterise
objects known as matte. This model assumes perfect diffusion and a perfectly homogeneous
material in which the reflected light does not depend of the point of view of the observer, but
rather, only of the angle θ composed by the incident ray of lighting and the normal to the object,
as seen in figure 2.5.
Figure 2.5 Lambertian reflectance model.
Equation 2.4 shows a mathematical description of the features of reflected light, being able to
notice that its intensity is maximum for perpendicular incidences on the surface of the object.
)()·()·cos()( λλθλ ibodyr LCL = (2.4)
θ
Chapter II Hyperspectral images for the classification of materials
Page 18
In this model, the luminous focus enters the body of an object where it is subject to reflection and
refraction phenomena with its body, interacting with the colourings of the material before
returning to the surface. For this reason, the different wavelengths are re-emitted to the surface
with different intensity, this intensity depending on the reflection coefficient or chromaticity of
the material )(λbodyC .
None of the two earlier models allow for an acceptable representation of the real reflection that
occurs in the majority of real materials. For this reason, Shafer [SHAF_84] proposes a
dichromatic model of reflection in which he expands on previous models assuming that light
interacts with the existing interface between the element and the material, but also enters in its
body interacting with the colourings of the material before returning to the surface. This model,
therefore, proposes the combination of the two earlier models in order to describe, in a steadfast
manner, the model of light interaction.
Fig. 2.6 Dichromatic model of reflection
The amount of reflected light in the interface is governed by the Laws of Fresnel, which associate
the reflectance with the angle of incidence, the index of refraction of the material, and the
polarisation of the incident light that is produced in the specular reflection.
Incident Light
Specular Reflection
Reflection of a body
Interface
Body
Colouring
g
i
e s
Li
Chapter II Hyperspectral images for the classification of materials
Page 19
The transmitted light passes through the material interacting with its colourings, producing a
probability of absorption that depends on the wavelength. The non-absorbed light is re-emitted
through the same interface creating the diffuse reflection of the object. The geometrical
distribution of this reflection is considered isotropic, non-polarised and usually of a different
colour to the incident light due to the effect of the colourings.
In this manner, the reflected light spectrum that is perceived by the observer is made up by the
sum of two types of reflection: mirror-like or specular and diffuse. The first is caused by the
different existing properties between the element and the material. The latter is caused by the
interaction of light with the body of the material (equation 2.5). Figure 2.6 shows that the
observed vector of lighting depends on several factors such as the incident lighting vector, the
angle of incidence, the geometry of the object, the position of the observer and the chemical
features of the material.
),,(),,,(),,,( int giLgeiLgeiL bodyerfacer λλλ += (2.5)
This equation is given by the incident light (Li), independent geometric factors (minterface,m body),
and factors which define the molecular properties of the objects (Cinterface,Cbody):
)()·()·,()()·()·,,(),,,( interfaceinterface λλλλλ ibodybodyir LCgimLCgeimgeiL += (2.6)
Using the neutral interface model, the interface coefficient Cinterface does not depend on the
wavelength and does not provide information on the chemical composition of the object, being
able to integrate it in the interface coefficient mcinterface.
)()·()·,()(·)·,,(),,,( interfaceinterface λλλλ ibodybodyir LCgimLCgeimgeiL += (2.7)
)()·()·,()()·,,,(),,,( interfaceinterface λλλλ ibodybodyir LCgimLCgeimgeiL += (2.8)
The reflection model in conductor materials is not correctly represented in the earlier model. In
these materials, when an incident spectrum interacts with the interface, its free electrons may
absorb part of the energy of the incident spectrum in different wavelengths. The non-reflected
part of the spectrum is absorbed and does not penetrate the body of the material in a dimension
Chapter II Hyperspectral images for the classification of materials
Page 20
greater than 10-7m. This phenomenon explains why in conductor materials there is no reflection
due to the body (diffuse).
The excited electrons release previously acquired energy which is re-emitted as photons. The
greater part of this energy is re-emitted as light with the same wavelength and only a small part is
re-emitted as heat. The typical golden colour of some metals, such as copper and gold, is due to
the fact that part of the spectrum between the blue and ultraviolet ranges is not re-emitted
[MATA_94].
Fig. 2.7 Reflection model on conductor materials
Equation 2.9 shows the equation that defines the calculation of the reflected spectrum in
conductor materials:
)()·()·,,()( λλλ iconductorconductorr LCgeimL = (2.9)
Here, one can notice significant differences in the model of colour formation for dielectric and
conductor materials. The use of diffuse lighting eliminates the specular component, both of
incident as well as reflected rays, causing the same type of response in each of the analysed
materials. Taking this into account, specular reflection, in turn, becomes diffuse reflection,
independent of the point of view of the observer:
)()·()·,()( λλλ imaterialr LCyxmL = (2.10)
Absorption
Chapter II Hyperspectral images for the classification of materials
Page 21
Where m(x,y) is a factor dependent of the geometry of the material that defines the quantity of
lighting that it reflects, )(λmaterialC which represents the chromaticity or reflectivity of the
material and which depends on its molecular properties and )(λiL the vector of incident lighting.
Bearing this model in mind, the features of the material defined in the vector of the chromaticity
of the material ( )(λmaterialC ) are intrinsically stored in the vector )(λrL . As shall be explained in
later chapters, there are methodologies to eliminate the influence of the incident lighting vector
and the m geometric factor, in such a manner that the chromaticity vector that is characteristic of
that material is extracted.
In other words, the reflected vector )(λrL estimates the chromaticity vector of a point of an
image and provides us with information on its molecular properties. Fig. 2.8 shows several
reflected spectra )(λrL observed through a hyperspectral camera:
Fig. 2.8 Representation of the reflectance spectrum )(λrL Going back to Figure 2.2, which illustrates the ranges of sensitivity of the different types of
existing cones in the human visual system, one can notice that the human being can only make
three readings of the reflected spectrum )(λrL . If, on top of this, one adds that it is necessary to
at least sacrifice a degree of liberty in order to eliminate the variability of the intensity of the
incident vector of lighting, the chromaticity vector that is observed by the human being, which
establishes the features of the observed object, is only composed of 2 or 3 components.
Wavelength (nm)
250 500 750 1000 1250
)(λrLVisible Spectrum
Chapter II Hyperspectral images for the classification of materials
Page 22
In the RGB colour model [WILS_04, GONZ_08], which represents a perception model similar to
that of the human being, the observed reflected spectrum is limited to the intensity produced in
the colour frequencies associated with the wavelengths of red, green and blue, as defined by
equation 2.11:
],,[],,[ BGRLLLrBrGrRrRGB ==L (2.11)
As can be inferred, the RGB spectrum observed by the human being does not contain all the
discriminating information contained in the full spectrum of chromaticity. This phenomenon has
been verified in numerous works which have made a comparative study on the precision of
classification methodologies based on spectroscopy versus colour based techniques
[HOLL_03], showing the advantages of spectrometric techniques for the classification of
materials versus those simply based on colour.
2. Acquisition and representation of hyperspectral images
In the previous section, a definition was provided for the reflectance spectrum of a material that
depends on the specific molecular properties of the associated element. Classical spectrographers
made calculations of the average spectrum of all signals that were captured by the receptor, that
is, the result was a "sum" spectrum of all the present elements. However, current spectral image
sensors obtain an image, associating each corresponding spectrum to its pixel.
In this section a listing is provided on the methods used for the capture and representation of
hyperspectral images. In order to ease its understanding, the section begins with the description of
the process of acquisition and representation of the monochrome image, colour and finish with
the description of these methods for hyperspectral images.
Although the first attempts to process digital images were developed for the transfer of newspaper
photographs between London and New York in the 20's, the first computer-processed digital
images were obtained from the lunar surface by NASA in 1964 in order to choose an adequate
landing area for the Apollo vehicles. Since then, vision sensors have notably evolved.
Chapter II Hyperspectral images for the classification of materials
Page 23
Currently, an image sensor consists of a matrix of perfectly aligned small cells. These cells are
composed of a photosensitive electronic element that produces a specific electrical voltage
depending on the quantity of light that it receives (CCD). Each of these cells is assigned a specific
(x,y) position. Following the same architecture, a digital image is defined by a matrix of rows and
columns that store a value related with a grey level at a given position, as can be seen in Fig. 2.9.
Fig. 2.9. Representation of a digital image in grey levels.
Digital colour images are obtained from conventional CCD sensors. Incident light is filtered or
diffracted in order to allow colour components R, G and B to reach the adequate sensor cell. One
of the methods that use this technique is known as the Bayer filter [BAYE_76], which consists of
a layer of R, G and B filters that cover the sensor (see Fig. 2.10). In this way, each sensor element
only receives one of the colour components, allowing the final interpolation in order to obtain the
final RGB pixel.
Fig. 2.10 Bayer filter over a CCD.
These colour images are represented by three two-dimensional matrices in which each of them
represents the sensor's response to one of the RGB colours. These three two-dimensional matrices
create a three-dimensional matrix in which the first two dimensions represent the position of the
point in the image, and the third dimension, each of the colour components, as can be seen in
figure 2.11
X
Y
Chapter II Hyperspectral images for the classification of materials
Page 24
Fig. 2.11 Representation of the RGB digital image.
Obtaining spectral images entails a greater complexity than the capture of colour images, in
which only three wavelengths corresponding to the colours red, green and blue are captured.
In order to achieve the capture of spectral images there are two main alternatives. The first, based
on a sequential acquisition, which is based on an adjustable filter or in a rotary filter wheel placed
in front of a monochrome camera. This approach keeps the spatial resolution of the sensor, but
requires that the object be perfectly static during the exchange of the two filters in order to avoid
losing the spectral coherence between the different captures.
Fig. 2.12 RGB filter wheel.
The other method which permits the acquisition of hyperspectral images simultaneously acquires
all spectral bands. In order to do so, the variability of the angle of refraction is used in relation to
the wavelength. In this manner starting with the capture of a line of an image similar to a line
scan camera, the spectral information is extracted through a prism that refracts each of the
wavelengths in the image. In this manner, the obtained image contains, in abscissas, the position
X
Y
Chapter II Hyperspectral images for the classification of materials
Page 25
of the captured line and, in ordinates, each of the spectral frequencies. In order to obtain the
complete image, many snapshots are combined.
Fig. 2.13 Principle of hyperspectral image acquisition.
Figure 2.13 shows this principle. First, a line of the image is acquired and the light of each of
those points in that line is spread vertically by the prism based on its wavelength. In this manner,
each line is captured in the CCD sensor as a two-dimensional image in which the horizontal axis
represents the position in the pixel in that line (X axis) and the λ axis represents the different
wavelengths spread by the prism.
Synchronising the capture by the camera with the Y movement produced between the camera and
the object, one obtains the different lines of the object creating the associated hyperspectral image
(Fig. 2.14).
Fig. 2.14 Hyperspectral cube
Unlike standard images that are observed by the human eye, hyperspectral images contain
complete spectral information on each spatial point of the image. This image is known as
λ
Prism Camera
x
Y
X
CCD Sensor
X
λ
Y
Chapter II Hyperspectral images for the classification of materials
Page 26
Hyperspectral cube (Fig. 2.14) and consists of a three-dimensional matrix in which the first two
dimensions represent the spatial positions in the image and the third dimension of the matrix
represents each of the spectral bands. From another perspective, one can simply consider a
hyperspectral image as a vectorial extension of a monochrome image. This last approach applies
the same tools as in a monochrome image (grey), but from a vectorial perspective.
One of the features of hyperspectral images is that each pixel in the image is represented by a
vector whose components correspond with each of the captured wavelengths, thus providing not
only information on the associated colour to the scene, but also, as previously shown, information
on its molecular properties [CHAN_04] [GRAH_07].
In a similar manner, by selecting a specific wavelength, one can obtain a two-dimensional image
associated to that wavelength, being able to simultaneously obtain spectral and spatial
information of the image.
3. Issues on the application of hyperspectral images for the classification of materials
The studies and forecasts provided by European research networks on hyperspectral matters
(Hyperspectral Imaging Network) [GAMB_07] indicate that the improved spectral resolution and
the high spatial resolution of contemporary spectral sensors open up numerous application
possibilities for this type of images. Among which, the following should be highlighted:
environmental modelling, detection of biological threats, monitorisation of spills, detection of
camouflaged elements, estimate of the chemical composition, detection of pathogens or tumour
cells, etc. [GRAH_07].
On the other hand, at the 3rd International Workshop on Spectral Imaging, which took place in
Graz, Austria in 2006, its preface makes reference to spectroscopy based on images (Spectral
Imaging) as the science that combines the advantages on machine vision with the potential of
traditional optical spectroscopy.
This combination integrates the discriminating power inherent to the spectrum of a material with
the techniques of segmentation and classification from conventional machine vision based on the
knowledge of spatial distribution (position) of the spectrum. Therefore, one can significantly
Chapter II Hyperspectral images for the classification of materials
Page 27
increase the information that can be obtained from different images through the integration of
available spectral and spatial information.
In a generic manner, the process for the classification of a specific element in a hyperspectral
image goes through the following phases, which are also listed in figure 2.15:
1. Image acquisition: Acquisition of a hyperspectral image through adequate capture
methods.
2. Selection of the spectrum to be analysed: Selection of a pixel from the acquired image
and selection of its associated luminous spectrum represented by a vector with a number
of components equal to the number of captured wavelengths.
3. Extraction of spectral features: The selected spectrum has a high number of components.
Furthermore, they are highly correlated. Due to the Hughes Phenomenon [HUGH_68], as
will be seen in Chapter IV, this makes the classification more difficult, making necessary
processes that reduce the dimensionality and extract relevant features that correctly
characterise the elements to be classified.
4. Classification: Use of the features extracted in the earlier section for the identification of
the selected spectrum based on mathematical classifiers.
5. Repetition of earlier steps for the remaining pixels of the image.
Fig. 2.15 Classification process of elements based on hyperspectral images
Extraction of features
Acquisition of the spectral image
Classification
Chapter II Hyperspectral images for the classification of materials
Page 28
The use of this methodology associates each pixel of the acquired image with a specific class or
material. However, studies undertaken by the European network on spectroscopic image
(Hyperspectral Imaging Network) [GAMB_07] emphasize the existence of several weak points
within the chain of data processing which are necessary to insist upon today:
- Spectral Correlation: Stress is placed on the need to have centralised models of materials
that do not depend on acquisition or lighting conditions.
- Classifiers: It is necessary to choose classifiers that are simple, robust and with great
capacity to be generalised.
- Detection of features: It is necessary to define with precision the features that define the
elements to be classified.
- Hughes Phenomenon: In the same manner, given the high dimensionality of data, it is
necessary to generate extraction methods of unsupervised features in order to reduce the
dimensionality of data without reducing the information contained.
- Feature extraction: This reduction of features must be independent of the set of data used
to train the system. In this manner, changes made in the training elements shall not
change their descriptor variables.
- Spectral-spatial integration: The majority of hyperspectral techniques do not take
advantage of the information of spatially near points. However, it is necessary to
integrate spectral and spatial information in order to achieve better classification results
[GAMB_04, PLAZ_05].
- Computational cost: The great amount of information contained in a hyperspectral image,
as well as the computational cost associated to its processing makes necessary the use of
new techniques of extraction and classification of features that are computationally
efficient.
4. Conclusion
This chapter has shown how the chromaticity spectrum of an element has a direct relation with its
molecular properties and how the acquisition of hyperspectral images, which simultaneously
contain spectral information of its pixels, obtain spectral information for each of the positions of
the image. This approach analyses, from a spectrographic perspective, those objects within it.
Chapter II Hyperspectral images for the classification of materials
Page 29
The use of these hyperspectral images combines spectrographic techniques with machine vision
techniques for their processing. This provides greater versatility when extracting information. An
illustration is provided of the traditional procedure for the classification of hyperspectral images
(Fig. 2.15) and a listing is made of the weak points in this process of classification.
There are several weak points, among which the following are highlighted: the election of
classifier, the methodology for the extraction of spectral features, the reduction in the Hughes
Phenomenon and the integration of spatial information of the nearby pixels with their spectral
information. This makes the design of an architecture of classification that overcomes these
limitations necessary.
In order to do so, Chapter III, will study the properties of the various existing classifiers in order
to select that or those most suitable for the generation of an adequate model of a material. On the
other hand, Chapter IV will provide an in-depth analysis of different methods for feature
extraction and the reduction of the Hughes Phenomenon, placing special emphasis on the
limitations that these methods have in order to define a set of desirable properties that should be
followed by an optimal descriptor, which in turn, allows the adequate description of a luminous
spectrum.
Chapter II Hyperspectral images for the classification of materials
Page 30
Page 31
Chapter III
Classification methods
Chapter III Classification methods
Page 32
The previous chapter detailed the process of the formation of hyperspectral images observing
their suitableness for the characterisation of materials based on their luminous properties.
However, the great amount of information contained in the luminous spectrum makes the process
of its classification not an easy task.
First, a description is provided on the proposed metrics in scientific literature that distinguish
between different spectra [KESH_04]. These classical approaches evaluate the existing distances
between several spectra in a Euclidean space Rn or are based in the measuring of the spectral
angle between them, SAM (Spectral Angle Mapper) [WILL_04]. These methodologies offer an
adequate quantification that is related with the similarity between two given spectra, but do not
allow either the analysis of the existing correlation in adjacent spectral bands [CHAN_03] or to
correctly solve the problems caused by the existing similarity between spectra of the same group
or class. In the case dealt with in this Thesis, each of these classes represents the material
associated to each of the analysed elements. In this manner, one will acknowledge that a spectrum
belongs to the aluminium class when the point associated to that spectrum is composed of
aluminium. The fact that spectra associated to a same class (intra-class variations) bear a high
degree of variation make the modelling of this dispersion necessary in order to be able to
correctly determine the class associated to this spectrum.
However, these intra-class variations can be adequately modelled through the use of pattern
recognition techniques which take the information contained in the luminous spectrum as an input
vector. These methods of classification try to emulate the working of the human brain. In this
way, input data is modelled through diverse techniques, creating a mathematical model that will
relate these with the desired output. These methods not only emulate human beings in the way
they reach their decisions or analyse their surroundings, but also performs tasks of inference or
classification that can not be done by the human being.
The adequate use of these classifiers also allows to infer new knowledge from the results obtained
from the training of the classifier. This discovers underlying causes of the classifying process, a
priori unknown. Using this same approach Tycho Brahe, without having previous knowledge of
the underlying physical causes of the geometry of the orbit of the planets, took their precise
measurements, reaching the conclusion that they were elliptical. Using these measurements, his
disciple Johannes Kepler used this knowledge to obtain the laws that bear his name and that
describe the physical phenomena which cause the elliptical shape of the orbits [DREY_13].
Chapter III Classification methods
Page 33
Analogously, the knowledge extracted by the classifiers have allowed to associate certain genes
or proteins with several diseases, to establish rules that predict the weather, to discover which
component of a medicine is the cause of the improvement in the evolution of a disease or, in sum,
extract rules that can be later used by the human being.
This chapter shows the metrics traditionally used for the quantification of the distance that exists
between two luminous spectra and a theoretical study is undertaken on the different existing
classifiers that are capable of modelling the intrinsic variation and the discriminant properties of
the materials to be classified.
The problem related with the high dimensionality of data inherent to hyperspectral images is not
dealt with in this chapter and shall be fully covered in Chapter IV.
1. Classical metrics
Traditionally, areas such as the multivariate analysis, signal processing or pattern matching have
used metrics based on distance in order to measure the existing differences between diverse input
signals.
As mentioned in an earlier chapter, a spectrum or a hyperspectral pixel, composed by the quantity
of reflected or emitted light in each of its wavelengths, is mathematically defined by a vector L of
m components, representing each of the components, each of the wavelengths in which this
spectrum has been quantized.
),...,,( 21 mLLL=L (3.1)
Where La and Lb are the vectorial representation of any two spectra, as shown in (3.1), a
definition can be provided for diverse distances derived from different norms l1, l2, lInf
- City block distance (CBD)
∑=
−=m
i
biaiba LLCBD1
),( LL (3.2)
Chapter III Classification methods
Page 34
- Euclidean distance (ED)
( )2
12
1
),(ED
−= ∑
=
m
i
biaiba LLLL (3.3)
- Tchebychev distance (TD)
[ ]biaiMiba LL −= <<1max),(TD LL (3.4)
These measures represent the existing distance between the two spectra in accordance with the
three most commonly-known rules and establish in a simple and intuitive manner the degree of
existing difference between two specific vectors.
However, small differences due to changes in the intensity of the reflected spectrum cause
variations in the intensity of the spectrum which are not correctly absorbed by the estimates of
similarity undertaken by the defined metrics. These intensity changes can be due to different
factors, such as small variations in the intensity of the incident source of light, the geometry of
the object or its specular reflections. All these causes were already mentioned in the previous
chapter.
Let us suppose a spectral pixel composed of only two bands. In this manner, this spectrum is fully
represented by a two-dimensional space through two axes for each of the wavelengths. Without
loss of generality, one can geometrically interpret the calculated distances between different
spectra in this two-dimensional space (Fig. 3.1).
Where ba LL , are two spectra which one wants to calculate its similarity, and let aL be the
vector aL affected by a variation in the intensity of the reflected spectrum. One can observe that
the distance measurements are affected by this variation in the intensity of lighting.
Chapter III Classification methods
Page 35
Fig. 3.1. Geometric representation of the Euclidean distance between La and Lb and the effects in
the value of this distance caused by intensity changes in the spectrum.
Other types of metrics, specifically designed as metrics for their use in hyperspectral images,
partially correct the sensitivity to lighting changes. These are based on orthogonal projections of
the two spectra to be compared. Among these, the one used most, is based on the calculation of
the spectral angle between two given spectra and is known as SAM (Spectral Angle Mapper)
[YUHA_92]. Based on the orthogonal projection between the two vectors, the existing hyper
angle between them is calculated, using this angle as the measure of similarity between spectra:
∑∑
∑
==
==><
=m
i
bibi
m
i
aiai
m
i
biai
ba
ba
LLLL
LL
11
1
·
··
,)cos(
LL
LLα (3.5)
)(cos),(SAM 1 α−=ba LL (3.6)
Taking into account the earlier two-dimensional representation in which the calculated hyper
angle corresponds with the geometric angle between two straight lines, one can observe that the
changes caused in the intensity of the spectrum do not cause changes in the angle between the
spectra.
Band 1
Band 2
),(ED ba LL
aL
bL
aL
),ˆ(ED ba LL
Chapter III Classification methods
Page 36
Fig. 3.2. Geometrical representation of the Spectral Angle Mapper (SAM) and the effects caused
by changes in its intensity.
=
2
),(SAMsin2),( ED ba
ba
LLLL (3.7)
If both La as well as Lb have been previously normalised to the unit, then a relation exists
between the Euclidean distance and SAM. One can see that the Euclidean distance bears the same
response as SAM when using normalised vectors and using small angle values.
Fig. 3.3. Relation between Euclidean distance and SAM.
Band 1
Band 2
aL
bL
),(SAM ba LL
),(ED ba LL
Band 1
Band 2
aL
bL
aL
),(SAM ba LL
),ˆ(SAM ba LL
Chapter III Classification methods
Page 37
Du Peijun et al [DU_05] analyse cases in which the classic SAM algorithm does not adequately
respond when estimating the differences between spectra. As it is based on the calculation of the
angle between two spectra, it is not affected by small changes in specific bands. These changes
oftentimes are provoked not by noise but rather by absorption bands due to chemical bonds
present in the spectrum, which truly constitute an essential feature for the discrimination between
different types of spectra.
In order to solve this problem, they offer several typologies of error and diverse methodologies
for the improvement of the algorithm, which though not validated by the obtained results,
highlight the limitations of SAM versus small variations in local ranges of the spectrum.
Other approaches, also based on the complete analysis of the spectrum, consider each spectrum as
a probability distribution. Taking this into account, the measures of divergence defined by
Kullback-Leibler [KULL_87] measure the difference between two given probability
distributions.
Although the Kullback-Leibler divergence is commonly referred to as a distance, it does not
comply with the commutative condition, therefore is not considered a true distance.
Where ba LL , are two spectra as previously defined. The definition of the Kullback-Leibler
divergence of the spectrum bL from the spectrum
aL is defined as the quantity of additional
information necessary to represent the spectrum aL taking the model as a basis
bL .
∑=
=m
iD
bi
aiaibaKL
1 L
L·logL)( LL (3.8)
In order to correct the non-symmetry of this measurement, Kullback-Leibler, in fact, define
divergence as:
)()(),( abKLbaKLbaKL DDD LLLLLL += (3.9)
Chapter III Classification methods
Page 38
This measure has been employed by [CHAN_04] to define SID (Spectral Information
Divergence). In it, an assumption is made that this measure better extracts of the spectral
variability, as it is not based on geometric features as SAM (angle) or the Euclidean distance
(spatial distance), but rather it calculates the separation between two spectra based on the distance
between the distribution of the probability function between them.
Divergence defined in this manner (3.8) shows greater sensitivity to local variations due to small
absorption bands than that obtained by the earlier metrics, while at the same time keeps a more
than acceptable tolerance to noise. This is due to the use of the ratio between two components in
each spectrum, weighted by the logarithm, instead of the subtraction of the Euclidean distance or
the multiplication of the SAM component.
In order to illustrate the measurements taken by these metrics, figure 3.4 includes four spectra
associated to three different materials: spectrum A1, taken as pattern; spectrum A2, spectrum
belonging to the same material as A; spectrum B, belonging to a different material than A with
quite a different spectrum, and spectrum C, of similar appearance as A.
Fig. 3.4. Spectral representation of materials A1, A2, B and C.
When observing the differences obtained when comparing spectrum A1 with the rest of materials
using the earlier metrics, one can observe that these metrics easily detect the difference of
material B versus spectrum A1. However, there are difficulties when determining reliable
Wavelength
Intensity
A1
A2
C
B
Chapter III Classification methods
Page 39
distances between spectra A1-A2 and A1-C, as in both cases one obtains similar similarity
measures, which do not detect whether those spectra correspond to similar materials or not.
This is due to the fact that the previously defined metrics , in a unsupervised manner, quantify
the existing differences between two spectra taking into account their global similarity. This has
the advantage that a previous estimate or extraction of relevant features or a supervised training is
unnecessary. This allows the use of unsupervised methods of classification (Means, Fuzzy K-
means), or methods of supervised classification based on examples (K-Nearest Neighbours).
However, the fact that they only take into account the global similarity of the spectra, without
taking into account local features, makes them not detect, in an adequate manner, small
discriminating features that result necessary for a correct classification of the material.
2. Classifiers
Though the previously described methods establish in a unsupervised manner the existing
differences between two given spectra from the whole of their components, they do not directly
extract or select those features that could be relevant to establish the difference between two
specific classes (as can be seen in table III.1).
In the same manner, they are also not capable of modelling the variations between spectra of the
same class, or to maximise the differences between elements of different classes, while, at the
same time, they minimise the differences between elements belonging to it.
These intra-class variations can be modelled through the use of pattern matching techniques that
boost those variations that represent the differences between the elements of classes while
reducing the influence of the intra-class variations.
TABLE III.1 COMPARISON OF THE OBTAINED DISTANCES FOR SPECTRUM A1 FROM DIFFERENT SPECTRA BASED ON CLASSICAL
METRICS
A2 B C
City Block 0.38 1.24 0.54 Euclidean Distance 0.06 0.19 0.08 Tchebychev Distance 0.03 0.05 0.03
SAM 1.00 0.98 1.00
SID 0.03 0.35 0.06
Chapter III Classification methods
Page 40
There are numerous approaches when trying to design a classifier, though classifiers designed
using different techniques obtain identical solutions. This is the case of some classifiers based on
neural networks, whose results correspond with those of other classifiers based on statistical
approach.
The choice of an adequate classifier is a complex problem that usually is related with the type of
problem to be resolved, the need to extract subsequent information of the classifier or the degree
of knowledge by the designer.
According to [JAIN_00], there are three different approaches when designing a classifier:
− Similarity.
− Statistics.
− Calculations of boundaries of decision.
This chapter will develop the principles and concepts of the different types of classifiers. In order
to ease the comprehension and differences between the undertaken classifications by the different
types of classifiers, two classes are going to be used, defined by two variables X, Y, that follow a
normal two-dimensional distribution and that are overlapping, as shown in Fig. 3.5.
Fig. 3.5. Distribution of model classes to be classified using different types of classifiers.
For the sake of simplicity, twenty training elements will be used for each of the classes, in such a
manner that the influence of a small number of samples can be noticed. This way, the functioning
Chapter III Classification methods
Page 41
of each type of classifier will be seen in a more intuitive manner. The data used for comparing
different classifiers is shown in figure 3.6.
Fig. 3.6. Distribution of training elements of the aforementioned classes.
The points used for the training of each of these classes are represented through the use of stars
and solid points, as shown in figure 3.6. Once the corresponding classifier is trained, then a
control cell is classified in order to define the region of the feature space that is classified for each
of the classes.
The chosen classes are defined by a Gaussian distribution that follows the distributions shown in
figure 3.7.
Fig. 3.7. Probability density functions of the two types of tests.
Chapter III Classification methods
Page 42
As both follow known Gaussian statistical distributions classes, the optimal classifier which
minimises the Bayes’ risk (assuming costs 0 and 1 as shall be later seen), is equivalent to a
statistical classifier that calculates the maximum probability a posteriori (MAP), assigning to each
element the class whose membership probability is greatest. As they are Gaussian types, equation
3.10 defines the membership probability of vector x to an average class µ and the covariance
matrix Σ:
−∑−− −
∑=∑Ν
)()(2
1
2/12/
1
)·2(
1), (
µxµx1
µx
T
eDπ
(3.10)
Using this classifier, the optimal classification map for the two given classes is as follows:
Fig. 3.8. Optimal classification map
2.1. Similarity classifiers
This approach constitutes the simplest and most intuitive of all that exist. It is based on the fact
that two vectors of similar features probably will belong to the same class.
In order to do so, it is necessary to obtain a correct measure of dissimilarity or distance for the
feature vectors that are studied. For the case of hyperspectral pixels, the earlier chapter described
different metrics that quantify the degree of difference between two spectra.
Chapter III Classification methods
Page 43
Based on these metrics, one can define several classifiers:
2.1.1. Nearest Neighbour 1-NN.
The nearest neighbour is one of the simplest classifiers that can be used. Through the use of this
method, the input vector is compared with all the range of training vectors, selecting the class
which has the nearest vector to the sample set as its membership class [DASA_91] [SHAK_05].
Despite obtaining reasonable results, this method requires the computation of distances with each
of the training points, therefore its computational cost is very high [XIE_93] [FUKU_84].
Another disadvantage of this method is that it requires many training samples in order to achieve
an adequate estimate of the probability density in each of the points [XIE_93] [FUKU_90] .
Despite these disadvantages, the nearest neighbour classifier is one of the most used as a
benchmark against other classifiers. The two main reasons for this are that it does not require
configuration parameters except for the selection of a distance. On the other hand, for an infinite
number of samples, this classifier has an error that is less than double the error that would be
obtained with the optimal classifier [COVE_67].
Figure 3.9 shows the classification results using this classifier. One can notice that the points near
training vectors are assigned to the class to which the nearest vector belongs.
Chapter III Classification methods
Page 44
Fig. 3.9. Classification map based on 1-NN
Fukunawa [FUKU_90, FUKU_84] proposed algorithms of data reduction for non-parameter
classifiers based on the nearest neighbour. These algorithms reduce the number of elements used
by these classifiers, bearing in mind the difference between the probability density function that is
estimated with a complete set of data versus that estimated with a reduced number of vectors. In
this manner, the computational cost of the calculation of the classifier is optimised at the same
time that the space required for its storage is minimised without damaging the performance of the
classifier.
2.1.2. Nearest mean.
One of the simplest classifiers commonly used to carry on a quick classification of highly
separable classes is based on the nearest mean. This classifier assigns each point to the class
whose centre of mass is nearest to it.
For the case of normalised Gaussian distributions whose variables are not correlated, that is, those
which have the identity as matrix of covariance, the classifier of nearest mean would correspond
to the Bayes' optimal classifier.
Chapter III Classification methods
Page 45
Fig. 3.10. Classifier of nearest mean separation
This classifier is not capable of modelling those classes that do not follow normalised Gaussian
distributions with variables that are not inter-correlated, as shown in figure 3.11. Because of this,
this classifier, despite its simplicity and reduced computational cost, does not allow a precise
classification in the majority of cases.
Fig. 3.11. Classification map based on nearest mean
2.1.3. Vector Quantization VQ.
Non-parametric methods, such as those based on the nearest neighbour or K-nearest neighbours
require a very high number of samples in order to achieve a valid estimate of the probability
distribution function. Due to this and in order to reduce the number of necessary samples,
methodologies have been proposed that reduce the size of the set of data used for the design and
training of the system in such a manner that the performance of the classifier is maintained.
Chapter III Classification methods
Page 46
In order to reduce the size of the required training set [HART_68] [DEVI_80], one must verify
the effect of eliminating or adding each of the samples used for its design so that only "good"
samples are kept. However, for very large data sets, these methods are difficult to implement and
bear a high computational cost as they must re-evaluated each time that a vector is added or
eliminated from the set. Another disadvantage is that, depending on the data, a high reduction of
data is not achieved and it is highly dependent on the samples of the sample.
Counterpoised, algorithms based on vector quantization represent the feature space through a set
of vectors that obtain a simplified representation of it, defining each of the feature vectors as the
model vector nearest to the aforementioned feature vector from a set of previously selected model
vectors [XIE_93].
Fig. 3.12. The two existing clusters are represented by the vectors VQ1
and VQ2
These methods construct a vector quantization for each of the existing classes. In this manner,
each of the vectors to be quantized is assigned the nearest measured vector using either the
Euclidean distance, or more commonly, the one that forms the smallest angle with the vector to
be classified (3.11).
( ) 221 ˆ·ˆ),( vvvv 1
TS = (3.11)
Feature X
Feature Y
VQ1
VQ2
Chapter III Classification methods
Page 47
),cos(),( 2121 vvvv =S (3.12)
Where 1v and 2v are normalised vectors, in such a manner that the vectors that have the same
angle shall obtain a similarity measure ),( 21 vvS =1, and those vectors that have a ninety-degree
angle shall obtain a similarity ),( 21 vvS =0.
Fig. 3.13. Where spaces are represented by two features, the decision on the model vector that
quantizes each of the input vectors is defined by the bisectors between the different vector
models.
This way, each origin vector is represented by an index or a binary vector that indicates which is
the vector model that represents it, this being the nearest to the origin vector.
Let us suppose the existence of M model vectors MVQVQVQ ,..., 21 . An input vector
v shall be represented by a quantized vectorVq of M components, its value being the unit in
that index whose associated vector model has greater similarity with the vector v and a value of
zero in the rest of the components.
( )
( )
≠
=
=
∀
∀
),(max),(,0
),(max),(,1
ii
i
ii
i
i
SSif
SSif
VQvVQv
VQvVQv
Vq (3.13)
Chapter III Classification methods
Page 48
In this manner, for a set of two-dimensional data quantized by three vectors 321 ,, VQVQVQ ,
the quantization of the vector, being the model vector 2VQ the nearest to the vector v , v would
be, according to (3.13) 0,1,0=Vq .
Once the vectors are quantized (and reduced), other classifiers can be used to establish to which
class do the quantized vectors really belong to. This approach reduces the existing noise in
training, as well as a reduction in the necessary complexity for a second classifier.
One of the methods based on vector quantization is that proposed by Kohonen [KOHO_88] and
known as Learning Vector Quantization (LVQ) [KOHO_88]. Through this technique, which
combines supervised and unsupervised learning under an approach based on artificial neural
networks, space is divided into groups of similar features in an unsupervised manner (clustering).
The number of subclasses is equal to the number of neurons present in the first layer of the
classifier.
This primary clustering is done through the learning rule for competitive networks, in such a
manner that in the training phase, starting from a set of random model vectors (Fig. 3.14a), each
of the training vectors is compared with each of the model vectors. Once the nearest vector is
chosen (winner), this is modified so that the model vector is closer to the training vector used
(Fig. 3.14b),
Fig. 3.14. a) Initial model vectors, b) training phase, c) set of training vectors, d) final model
vectors.
This way, model vectors are modified during their training stage in a way that each of them
represents a cluster or set of data that is present in the training data (Fig. 3.14c, Fig. 3.14d).
a) b) c) d)
Chapter III Classification methods
Page 49
Once the input vectors are quantized by the model vectors that were obtained in the unsupervised
training phase and which are represented in the binary vectorial representation in (3.13), each of
the model vectors is associated with its corresponding class in a supervised manner.
This approach efficiently resolves the case of classification of nonlinearly separable classes, in
which the classifier to be used requires a high number of samples and at the same time a classifier
of sufficient complexity in order to model the non-linearity of the classes (Fig. 3.15).
Característica
Característica
Fig. 3.15. Nonlinearly separable classes.
Through unsupervised clustering, the feature space is divided into a certain number of
differentiated subclasses between them. Once these subclasses are defined, the problem lies in
assigning each of the subgroups to each of the classes to which they truly belong.
This mixed supervised/unsupervised classification , first, separates nonlinearly separable classes
into compact subgroups that could be modelled in a simpler manner than the original class, and,
second, assigns each of the subgroups to its corresponding class through a supervised linear
classifier.
Feature Y
Feature X
Chapter III Classification methods
Page 50
Fig. 3.16. Vectorial quantization obtaining four model vectors for each of the existing clusters.
Figure 3.16 shows model vectors that have been obtained to separate each of the classes, in this
way, those training points near VQ1 shall be defined by vector 1,0,0,0, VQ2 0,1,0,0, VQ3
0,0,1,0, VQ4 0,0,0,1.
In this way, the supervised phase of the classifier only has to assign vectors 1,0,0,0 and
0,0,1,0 to the black triangle class and the rest to the white trinagle class.
Figure 3.17 shows the result of the LVQ algorithm clustering the data in two, three and six
elements.
Fig. 3.17. Classification map based on LVQ (S=2, 3, 6)
Feature X
Feature Y
VQ1 VQ2
VQ3
VQ4
Chapter III Classification methods
Page 51
2.2. Statistical classifiers
Statistical classifiers are a second type of classifier. These are based on a probability approach,
that is, in a reliable calculation of the probability that for some given conditions (the set of
features that define that element), it belongs to one class or another.
There are two main approaches to statistical methods. On the one hand, those that entail a specific
statistical distribution and that try to obtain those parameters that define a specific distribution, as
for example, the mean and variance, in the case of a univariate Gaussian distribution.
On the other hand, non-parametric approaches that try to estimate the probability density in any
region of the feature space without making any assumption on the type of distribution that they
create.
2.2.1. Probability Theory
A key concept in statistical classification methods, in particular and for any classifier in general,
is that of uncertainty. The probability theory becomes a fundamental framework for upholding all
the theory that underlies each classifier, given errors in its acquisition, the use of a reduced
number of features and the use of a finite set of sample data.
In a given classification problem where the aim is to assign an element that is defined by a feature
vector to one of the C possible classes, an adequate classifier could be that which classifies these
elements in accordance to which is the most probable class, given the previously observed
conditions.
In this manner, the probability that an element, given its feature vector x, belongs to a Ci class, is
defined by the conditional probability that it belongs to class i conditioned to the observation of
the vector x, that is, the class for which )( xiCP is maximum [KASH_86] [MORR_76].
Given that )( xiCP is unknown, the Bayes' theorem will be used (3.14) to indirectly calculate this
probability starting from the conditional probability that a vector x belong to class Ci.
Chapter III Classification methods
Page 52
)(
)(·)()(
x
xx
P
CPCPCP
ii
i = (3.14)
Let us suppose two classes i and j, the classifier shall assign the element to that class that has
greater probability.
≤
>=
)()(
)()(
xx
xx
ji
ji
CPCPifj
CPCPifiClass (3.15)
Applying Bayes' theorem (3.14), )( xiCP is expressed in known terms, as:
≤
>=
)(
)(·)(
)(
)(·)(
)(
)(·)(
)(
)(·)(
x
x
x
x
x
x
x
x
P
CPCP
P
CPCPifj
P
CPCP
P
CPCPifi
Classjjii
jjii
(3.16)
As )(xP is constant for all classes, it can be eliminated:
≤
>=
)(·)()(·)(
)(·)()(·)(
jjii
jjii
CPCPCPCPifj
CPCPCPCPifiClass
xx
xx (3.17)
Working out the value, we obtain )(xrl (3.17), which represents the likelihood ratio and that can
be estimated directly as it depends on the a priori probabilities of the classes and on the
probability densities of the data for each class.
)(
)()(
j
i
rCP
CPl
x
xx = (3.18)
Representing the earlier decision rules based on the likelihood ratio, one obtains:
Chapter III Classification methods
Page 53
≤
>=
)(
)()(
)(
)()(
i
j
r
i
j
r
CP
CPlifj
CP
CPlifi
Class
x
x
(3.19)
This classifier (3.19) is known as the classifier of Maximum a Posteriori (MAP). This classifier
minimises Bayes' error and is going to produce a smaller number of incorrect classifications.
If, additionally, the equiprobability of classes is added, that is, )()( ji CPCP = one obtains the
classifier of Maximum Likelihood (3.20).
≤
>=
1)(
1)(
x
x
r
r
lifj
lifiClass (3.20)
However, previous classifiers assume that the consequences of making a mistake when
erroneously classifying an element of one class in another have the same importance. However,
this condition is highly dependent on the use or application that is going to be given to the
classifier. For example, when separating radioactive materials, to include non-radioactive
materials in the mix is not as important as to label a radioactive material as non-radioactive.
In this manner, Kij is defined as the classification cost of an element that belongs to class i when it
really belongs to class j. Defining these costs, Bayes' rule is defined as that classifier that
minimises the risk of Bayes [BERG_85].
−
−≤
−
−>
=
)()(
)()()(
)()(
)()()(
iiiji
jjjij
r
iiiji
jjjij
r
CPKK
CPKKlifj
CPKK
CPKKlifi
Class
x
x
(3.21)
Assuming a symmetric cost function in which the cost of incorrectly classifying an element is one
( K12,K21=1) and the cost of correctly classifying another element is zero ( K11,K22=0), Bayes' law
(3.21) is simplified, thus becoming the classifier of maximum a posteriori (2.18).
Chapter III Classification methods
Page 54
Furthermore, if the classes would be equiprobable, then one would obtain the classifier of
maximum likelihood (3.20).
Let us suppose C1 and C2 as two differentiated classes that are defined by a single variable X.
Without loss of generality, let us suppose the Gaussian distribution for each of the classes as is
shown in figure 3.18.
Fig. 3.18. Representation of the probability distribution of two classes C1 and C2
The classifier based on Bayes' law shall classify the classes according to equation 3.21.
−
−>=
otherwiseifC
CPKK
CPCPKKCPifC
Class
2
11121
22221211 )()·(
)(·)()·()(
xx
(3.22)
Let us assume that class C2 has double the probability of appearing than class
C1 2))()(( 12 =CPCP , that the cost of correctly classifying each of the classes is zero (K11=0,
K22=0), and that the risk of erroneously classifying an element of class C1 as a member of C2 is 4,
while the risk of classifying an element of C2 as C1 is only of 1 (K21=4, K12=1).
Applying all three previously described classifiers, maximum likelihood (3.20), maximum a
posteriori (3.18) and Bayes' law (3.20) to the previous case, one can see the different decision
maps that have been obtained for each of the classifiers (figure 3.21).
P
)( 1CP x
)( 2CP x
x
Chapter III Classification methods
Page 55
Fig. 3.19. Classification based on a) maximum likelihood classifier, b) MAP classifier, c) Bayes'
law.
In practice, statistical classifiers estimate the probability distributions of each of the classes that
are present. Once this probability density is estimated, a selection of the target class is made in
accordance with any of the three aforementioned classifiers. Depending on the way that these
densities are estimated, these classifiers are classified as either parametric or non-parametric
classifiers.
2.2.2. Parametric methods.
Statistical classifiers based on parametric methods initially assume a certain distribution for each
of the classes and make an estimate of the probability density function by calculating the
parameters which define these distributions.
For the case of binary variables, we assume a binomial or multinomial distribution, while in case
of continuous variables we assume a Gaussian distribution. Given that this problem does not have
multinomial variables, we are going to set aside their development. The reader has more
information available in the associated bibliography [BISH_06][KACH_86][MORR_76].
Another widely used parametric method, as an extension of the Gaussian distribution, is the
assumption of a distribution of a class as a mixture of several Gaussians. This is an approach that
allows a more complex and exact modelling for these classes that are defined by several Gaussian
clusters.
2.2.2.1. Gaussian distribution
P
)( 1CP x
)( 2CP x
x
2
)( 2CP x2)·( 2CP x
P
)( 1CP x
)( 2CP x
x
P
)( 1CP x
)( 2CP x
x
restosiC
CPCPsiC
2
211 )()( xx >)(
)(·)()(
1
221
CP
CPCPCP
xx >
)()·(
)(·)()·(
11121
222212
CPKK
CPCPKK
−
− x
Chapter III Classification methods
Page 56
The Gaussian model, also known as normal distribution, is a widely used model to define the
distribution of continuous variables. For the case of an only variable x, the Gaussian distribution
is defined as:
−−
=Ν2
2)(
2
1
2/12
2
)·2(
1),(
µx
e µx σ
πσσ (3.23)
Where µ andσ are the mean and standard deviation that define the distribution. For the
multivariate case, the Gaussian multivariate distribution is shaped as:
−∑−− −
∑=∑Ν
)()(2
1
2/12/
1
)·2(
1), (
µxµx 1
µx
T
eDπ
(3.24)
Where µ is the multidimensional mean vector and Σ the covariance matrix of the associated
class.
The selection of the Gaussian distribution to model a class defined by a set of variables is not a
fortuitous decision, the Gaussian distribution model many of the phenomena in nature as it is the
distribution that maximises the entropy. Additionally, the sum of the set of random variables is
another random variable, which takes Gaussian shape as the sum increases in terms.
Analysing the equation of the multivariate normal distribution (3.24), one can notice that only
one parameter depends on the position of the feature vector x, which is shown in (3.25).
)()( 12 µxΣµx −−=∆ −T (3.25)
This expression, known as the Mahalanobis distance is inversely proportional to the membership
probability of a point in space to the Gaussian class defined byµ and Σ.
One can verify that the representation of the geometric place of the equiprobable points is placed
in a two-dimensional space that corresponds with an ellipse with centre in the vector µ and with
axes on the direction of each of the eigenvectors of Σ, the length of these axes being proportional
to the magnitude of the square root of the associated eigenvalues [BISH_05].
Chapter III Classification methods
Page 57
Fig. 3.20 Representation of an equiprobable surface for a two-dimensional Gaussian distribution.
Analogously, for spaces of higher dimension, the geometric place of the equiprobable points is
characterised by a hyperellipsoid whose axes are defined by the direction of the eigenvectors of
the covariance matrix and whose elongation corresponds with the eigenvalues associated with
each of the eigenvectors.
2.2.2.2. Estimate of the parameters for a Gaussian distribution from N
observations
Usually, the parameters µ and Σ of the Gaussian distribution are unknown and must be estimated
based on examples. Assuming a set of N starting observations NxxxX ,..., 21= belonging to a
Gaussian distribution, one can estimate the most probable Gaussian distribution from these
observed data.
Given this set of vectors X, one can define the logarithmic probability of having obtained that set
of data X conditioned to their belonging to a Gaussian distribution defined byµ and Σ as:
( ) )()(2
1ln
22ln
2
·,ln
1
1∑=
− −−−−−=N
n
n
T
n
NDNp µxΣµxΣΣµX π (3.26)
In order to obtain those parameters that maximise this likelihood, their function is differentiated,
making it equal to zero in order to obtain those parameters µ and Σ that are most probable.
U1
U2
µ µ
(λ1)0.5
(λ2)0.5
X1
X2
Chapter III Classification methods
Page 58
Differentiating with respect to the mean µ and equalling this derivative to zero, one obtains the
value of the parameter µ that is most probable (3.28):
( ) )(,ln1
1∑=
− −=∂
∂ N
n
np µxΣΣµXµ
(3.27)
)(1∑=
=N
n
nML xµ (3.28)
Similarly, one can calculate the covariance matrix of maximum likelihood [MAGN_99] thus
obtaining the following results:
( )TMLn
N
n
MLnMLN
µxµxΣ −−= ∑=
)(1
1
(3.29)
Observing the mathematical expectation for each of the estimates of the parameters of maximum
likelihood, one observes that the mean of maximum likelihood corresponds with the real mean
(3.30). However, the estimate of maximum likelihood of the covariance does not correspond with
the real covariance (3.31).
[ ] )µµ =Ε ML (3.30)
[ ] ΣΣN
NML
1−=Ε (3.31)
In order to correct this fact, the variance estimator is re-calculated according to the following
formula (3.32).
( )TMLn
N
n
MLnN
µxµxΣ −−−
= ∑=
)(1
1~
1
(3.32)
In this manner, starting with a set of data, one can calculate the parameters that define the most
probable Gaussian distribution for that set of data through the application of formulas (3.30) and
(3.32).
Chapter III Classification methods
Page 59
2.2.2.3. Application of Bayes' classifier to Gaussian distributions
Once the probability density function for each of the classes is estimated through the calculation
of the parameters that define the most probable Gaussian distribution from a set of data (3.30),
(3.32), then one can substitute the value of the probabilities defined by Bayes (3.21) with the
estimated Gaussian probability density function (3.24).
For the sake of simplicity and without loss of generality, let us assume that the maximum
likelihood classifier defined in (3.30).
Substituting the Gaussian probability function (3.24) in the equation that defines the likelihood
(3.17), there remains:
), (
), ()(
jj
ii
rl ∑Ν
∑Ν=
µx
µxx (3.33)
−∑−−
−∑−−
−
−
∑
∑=
)()(2
1
2/12/
)()(2
1
2/12/
1
)·2(
1
1
)·2(
1
)(jj
Tj
iiT
i
e
e
l
j
D
i
D
rµxµx
µxµx
1
1
x
π
π (3.34)
Simplifying this, there remains:
−∑−+−∑−− −−
=)()(
2
1)()(
2
1
)(jj
Tjii
Ti
elr
µxµxµxµx11
x (3.35)
Therefore, equation (3.20) can be re-written in the following manner:
>=
−∑−+−∑−− −−
otherwiseifj
eifiClassjj
Tjii
Ti
1)()(
2
1)()(
2
1µxµxµxµx
11
(3.36)
Chapter III Classification methods
Page 60
Extracting logarithms from both terms of the equation, there remains:
>−∑−−−∑−
=−−
otherwiseifj
ifiClass ii
T
ijj
T
j 0)()()()( µxµxµxµx11
(3.37)
Bearing in mind the Mahalanobis distance defined in (3.24), one can notice that the estimate of
maximum likelihood for Gaussian classes is the equivalent of assigning to each element the class
which has the smallest Mahalanobis distance.
−∑−>−∑−
=−−
otherwiseifj
ifiClass ii
T
ijj
T
j )()()()( µxµxµxµx11
(3.38)
Mention must be made that in a case where the covariance matrices of both classes are identical
and with value of the identity matrix, the expression (3.38) is simplified obtaining the same
formulation as the Euclidean distance.
Where the covariances of all classes are identical, then the functions that define the decision map
of this classifier shall be linear. Otherwise, these shall be defined by quadratic functions obtained
at the intersection of two or several ellipses of non-parallel axes.
Figure 3.21 shows the classification by a maximum likelihood classifier through the estimate of
the Gaussian distribution parameters from a set of sample data. One can notice that the
classification is not exactly the same as that of the optimum classifier. This is due to the fact that
this classifier has estimated the necessary parameters based on a small number of training vectors.
Chapter III Classification methods
Page 61
Fig. 3.21 Classification map based on an estimate of Gaussian distributions.
2.2.2.4. Gaussian mixture
Although the Gaussian distribution bears important analytic properties, there are important
limitations when modelling real data in which the studied classes can not be modelled as a sole
Gaussian, as each class can be made up of several separate clusters that can not be modelled as a
whole as a Gaussian. However, these clusters can be modelled through the combination of several
Gaussians [MCLA_88]. These examples include cases in which elements belonging to a same
class come from different Gaussian subgroups, that is, that they belong to the same class, but
being part of different differentiated subclasses.
The following figure (Fig. 3.22) shows two defined classes, the first by two elongated Gaussian
subclasses (A1 A2) (Fig. 3.22a), and the second class defined by another two, more compact
subclasses (B1 B2) (Fig. 3.22b) as shown below.
Chapter III Classification methods
Page 62
Fig. 3.22 a) Distribution of points belonging to two classes whose probability function follows a
distribution each of them based on the sum of two Gaussian distributions. b) Probability function
assuming that each of the classes follows a Gaussian distribution.
If an attempt is made to make these classes similar to a Gaussian model, as shown in figure
3.22b, the model that is obtained is not capable of correctly representing the probability function
of each of the classes, thereby producing a lot of error in the classification. This classification
error is shown in figure 3.23, showing a high number of incorrect classifications.
Fig. 3.23. Classification map based on a sole Gaussian estimate for each class for classes with a
probability density function based on Gaussian mixture models.
Chapter III Classification methods
Page 63
In the model based on mixed Gaussians (Gaussian Mixture Model or GMM ) [MCLA_00], each
of these density functions is defined as the sum of the probabilities of each of the K Gaussians
which they are part of, thus defining the probability of each class as:
∑=
∑Ν=K
k
kkkp1
), ()( µxx π (3.39)
Where kk ∑,µ are the factors that define the Gaussian functions of which the class is made of,
and kπ are each of the weights for each of the classes.
In this manner, each of the classes present is defined by the sum of several Gaussian distributions,
as shown in figure 3.24.
Fig. 3.24 a) Probability function of class 1 as defined by a mixture of two Gaussian functions. b)
Probability function of class 2 as defined by a mixture of two Gaussian functions.
In accordance with the previous distributions, an optimal classifier based on a Gaussian mixture
allows for an optimal modelling of the probability density function that defines each of the
classes, creating a minimal rate of error, as can be seen in figure 3.25.
Chapter III Classification methods
Page 64
Figure 3.25. Bayes' optimal classification map for data based on models of Gaussian mixture.
2.2.2.4.1. Estimate of parameters of a distribution of Gaussian mixture from N
observations
Let NxxxX ,..., 21= be a set of observations to be modelled as a Gaussian mixture. If one
assumes that these points come from a distribution based on a Gaussian mixture, the function of
likelihood that, from a set of sample points X, defines the probability that those points belong to a
distribution mixture of Gaussians, for a set of parameters πk, µk, Σk of each of the Gaussian, is
provided by:
( ) ( )∑ ∑= =
Ν=N
n
K
k
kknkp1 1
,ln,,ln ΣµxΣµπX π (3.40)
Where N is the number of observed elements, K the number of Gaussian functions that define the
mixture distribution, πk the weights associated to each of the distributions, and µk and Σk the
mean vector and the covariance matrix that define each of the Gaussian functions.
Maximising this function, one obtains the parameters that more reliably describe the density
function of the input data. There are several approaches to calculate optimal parameters. Those
Chapter III Classification methods
Page 65
methods based on gradient descent [NOCE_99] [FLET_87] have sufficient capacity in the search
for these parameters.
Despite the advantages of these methods, as well as other methods based on non-derivative
searches such as genetic algorithms, the method most commonly used for the search of these
parameters is the method known as Expectation-Maximization or EM [DEMP_77], [MCLA_97].
The basis of the EM algorithm is detailed in [BISH_06]. Next, a summary of the implementation
of the EM algorithm is included for the calculation of the parameters that define a distribution
based on a Gaussian mixture [BISH_06].
1. Initialisation: The means µk, covariances Σk and the weights πk that define the
distribution are initialised and the likelihood function is evaluated (3.40).
2. Expectation: The responsibility functions )( nkzγ are evaluated (3.41) using the current
values of the parameters.
( )( )∑
=
Ν
Ν=
K
j
jjjj
kknk
nkz
1
,
,)(
Σµx
Σµx
π
πγ
(3.41)
3. Maximisation: The parameters πk, µk and Σk are recalculated bearing in mind the
responsibilities calculated in an earlier step.
∑=
=N
n
nnk
k
new
k zN 1
)(1
xµ γ (3.42)
( )( )Tnew
kn
N
n
new
knnk
k
new
k zN
µxµxΣ −−= ∑=1
)(1
γ (3.43)
N
N knew
k =π (3.44)
Where Nk is defined by:
∑=
=N
n
nkk zN1
)(γ (3.45)
Chapter III Classification methods
Page 66
4. Evaluate the likelihood function (3.40) and verify that the convergence criteria are met.
Otherwise repeat step 2.
If one estimates the probability density function using the Expectation-Maximization method, the
parametric model obtained using those training points shown in Fig. 3.26 are very similar to the
classification map of the Bayes' optimal classifier.
Fig. 3.26. Classification map based on the estimate of the GMM from the Expectation-
Maximization algorithm
2.2.3. Non-parametric methods
The greatest problem of parametric methods lies in that one must know a priori the shape of the
probability distribution of the sample that is going to be modelled. This limits these methods due
to the fact that some of the assumed statistical models shall not be capable of modelling all the
complexity of the shape of the probability density function of the distribution.
Different from the earlier ones, non-parametric methods do not assume a specific distribution or
form of the probability density function, rather they estimate the density function in each of the
Chapter III Classification methods
Page 67
regions of the feature space based on their local behaviour, without assuming any specific
statistical behaviour.
In contrast to the advantage of not having to assume the shape of the probability distribution, a
greater number of training data is necessary in order to undertake its precise estimation as well as
a lot of storage space in order to contain the data of the estimated probability function.
2.2.3.1. Partition models on histograms
The simplest model for the estimate of the probability density function through non-parametric
methods is that based on histograms. For this, the space of the features is divided into a set of
partitions in such a manner so that each of the observations is going to belong to one of the
calculated groups. In this manner, the density function is defined for each of the partitions as the
number of observations belonging to that partition over the total number of observations made.
i
ii
N
np
∆= (3.45)
Where pi is the probability density for each of the partitions, ni the number of observations within
partition i, N the number of total observations and ∆i the size of the analysed partition.
This method can be extended to multivariate distributions, substituting the term ∆i with the
hypervolume of each of the partitions in the feature space.
The selection of size ∆i influences the estimate of the probability density function. High values of
this parameter shall make the curve be excessively smooth and will not correctly represent the
probability function (Fig. 3.27a). However, low values shall offer a more precise estimate of the
probability density function. On the other hand, they require a greater number of samples for its
training; otherwise they obtain a bad estimate as shown in figure 3.27c. As a general rule, a
compromise is reached between the precision of the curve and the number of samples necessary
for a correct estimate, as can be seen in figure 3.27b.
Chapter III Classification methods
Page 68
Fig. 3.27. Estimate of the probability density function: a) two partitions, b) four partitions, c)
twenty-four partitions
In practice, the method based on the partition of the histogram is useful when estimating density
functions and visualising data in one or two dimensions, but becomes unfeasible for the majority
of applications in which there is a requisite to estimate probability density functions.
On the one hand, this is due to the fact that dividing the feature space into a number of set
partitions makes this probability density function be discontinuous along the boundaries of the
frontier [BISH_06].
On the other hand, to obtain a density of data sufficient to make a precise estimate of the
probability density function in high dimensionality spaces for each of the partitions becomes
absolutely unfeasible. In order to create a histogram of M partitions in each of the variables, given
a variable x of D components, the total number of partitions would be MD, needing to obtain an
adequate density of observations in each of the partitions.
Despite its limitations, the models based on histogram partition can be used as a basis for the rest
of non-parametric methods which estimate the density function by correcting its limitations.
The Parzen window and, overall, the Kernel density estimators [PARZ_62], [DUDA_73],
eliminate some of the problems of basic methods based on histograms by assuming a binomial
distribution of the number of points belonging to each of the partitions and by substituting the
hypercube partition with Gaussian kernel that eliminate the generation of discontinuities in the
probability function.
2.2.3.2. K-nearest neighbour
P
x
P
x x
P
Chapter III Classification methods
Page 69
One of the problems of the earlier methods lies in that the calculation of the size of the partition,
or the chosen kernel, is the same regardless of the region of the feature space where we find
ourselves. In this manner, a large size would be useful when estimating the probability density
function in scarcely dense regions. However, it will not provide necessary information on the
shape of the probability density function. On the other hand, a small value of this parameter will
result in the obtaining of a probability density function that has great detail in the regions of high
probability, but noisy in regions of low density due to the lack of training samples in that region.
If instead of determining a priori the value of the size of the partition one creates a sphere in point
x where one wishes to calculate the probability density and one makes its radius grow until it
contains K elements, the estimate of probability p(x) for this point shall be:
NV
Kp =)(x (3.46)
Where N is the number of total observations and V the volume of the hypersphere used.
This method is known as K-nearest neighbours [SHAK_05] and solves some of the limitations of
the methods based on fixed window sizes that were been previously seen, where this size is
adjusted depending on the degree of density in each of the regions of the feature space. A point to
bear in mind on this method is that the value of K determines the smoothness of the density
function. The intermediate values of K are its optimal values.
Applying this estimation of probability density to each of the Ci classes that one wants to classify,
then one has the estimate of the probability density for each of the classes:
VN
KCp
i
ii =)(x (3.47)
And the probability for each of them:
N
Np i=)(x (3.48)
Chapter III Classification methods
Page 70
Applying Bayes' theorem (3.14) and substituting each of the terms with (3.46, 3.47 and 3.48), one
obtains the maximum probability function a posteriori for the classifier based on K-nearest
neighbours:
K
K
P
CPCPCP iii
i ==)(
)(·)()(
x
xx (3.49)
In this manner, the Bayes' optimal classifier is given by that class that has the most elements
within the K-nearest elements.
Figure 3.28. Effect of number K in the classification map obtained through the classifier based on
K-nearest neighbours, for K=1, K=2 and K=4.
2.3. Classifiers based on the calculation of boundaries of decision
A third category of classifiers is based on the calculation of boundaries of decision in an
empirical manner, in such a way that these boundaries minimise a specific condition imposed in
the design phase.
Some of the criteria to be minimised that are most used by these methods are the apparent error of
classification (classification rate of error), or the mean quadratic error between the output value of
the classifier and the numeric value (usually vectorial) associated with the correct class.
Chapter III Classification methods
Page 71
A classical example of these methods is Fisher's discriminant analysis that, as mentioned in the
previous chapter, maximises the function which measures the separability between classes,
maximising it.
Other examples of these types of classifiers are neural networks such as the perceptron
[RAUD_98] or the multilayer perceptron. The perceptron calculates a hyperplane that separates
the classes thus modifying the value of their internal weights until the calculated plane minimises
the classification error.
The multilayer perceptron acquires the capacity to model nonlinear boundaries of decision.
However, this property can cause an over-training in the classifier as its complexity (number of
neurons and layers) increases, which is corrected by means of other methods of regularisation.
[BISH_95, CHENG_94, RIPL_96]
Other methods such as the Support Vector Machines or SVM [VAPN_98, BURG_98, SCHO_97]
project the feature vector through the adequate kernel to another space of higher dimension in
which data is more easily separated. In that higher order space, a linear classification is made,
maximising the margin of separability between classes [VAPN_06].
In this section, given its importance in the field of classification, a brief description is provided on
the functioning and the theoretical basis of the perceptron and the multilayer perceptron.
2.3.1. Perceptron
The perceptron is a linear classifier invented by Frank Rosenblatt in 1962 [ROSE_58]. His work
was highly criticised by Marvin Minsky [MINSK_69], who stated that neural networks were only
going to be able to resolve problems that were linearly separable. However, although Minsky was
only able to demonstrate this theory for single layer networks, his work caused a great decline in
funds invested for research in this field and it did not recover until the mid-80's when the
backpropagation algorithm was created.
The perceptron was created to the likeness of the functioning of the human visual system. Each of
the perceived elements is projected on an area known as projection area, which in turn is linked
Chapter III Classification methods
Page 72
with neural connections to another area, known as the association area, whose connections
determine the response given by the network, as shown in figure 3.29.
Figure 3.29. Biological basis of the classical perceptron.
The mathematical formula for the functioning of the perceptron is based on a feature vector x that
can be transformed in a nonlinear manner in order to obtain a transformed feature vector Ф(x)
(projection area). This vector is multiplied by a set of associated weights w (association area),
thus the response of the network being a function of a linear combination of the input vector,
weighted by the vector w. For simplicity's sake and without loss of generality, let us consider
Ф(x)=x, extracting the area of transformation from the neural network model. However, one can
consider the transformation of Ф(x) outside the network model.
Fig. 3.20 Mathematical representation of the perceptron.
X1
X2
…
wT·x+b
W1
W2
WN
y(x)
b
f(y(x))
Projection Area Association Area Responses
Retina
Chapter III Classification methods
Page 73
Figure 3.30 shows that each of the input vectors is weighted by a weight wi, and by an offset b.
by += xwx T ·)( (3.50)
<−
≥+=
0)(,1
0)(,1))((
x
xx
y
yyf (3.51)
Where the result of the output of function y(x) is positive, the feature vector shall be assigned to
class A, and where negative, to class B. If y(x) is equalled to zero, then one obtains that the
element that defines the boundary of decision between both classes corresponds with the
hyperplane 0· =+ bxw T . In this manner, one can see that the perceptron separates the present
classes through a hyperplane, whereby this classifier shall be useful in the separation of classes
that are linearly separable.
The way to establish the weight vector w that defines the discriminating hyperplane is based on
the minimisation of an error function that assumes zero error when an observed feature vector is
correctly classified and an error equal to the unit when an element is incorrectly classified.
Therefore one obtains an error function, including as weight within the vector w=[w;b] the b
offset and assuming an input x=[x;1], in such a way that it unifies the offset as an input that
always has the value of the unit and an associated weight defined by b.
))((·(·)(1
n
N
n
n yfE xxwwT∑
=
−= δ (3.52)
Where N is the number of elements present in the sample set and δ the parameter that defines the
class to which vector n belongs. In such a way that if an element is correctly classified, then there
shall be no error, and if it is not, this error shall be equal to nxwT ·
In order to find those weights w that minimise this error function, one applies an iterative
algorithm based on a descending gradient which modifies the association weights, thus
minimising the error produced in the classification. In this manner, the weights are interactively
modified throughout the training process according to the opposite direction of the error gradient
Chapter III Classification methods
Page 74
in accordance with the learning function (3.53), converging where the problem is linearly
separable in a finite number of iterations [ROSE_62].
nn
ttt yfE xxwwww ))((·()(1 −+=∇−=+ δηηr
(3.53)
The fact that the problem is linearly separable does not imply the existence of a sole solution. The
result of the solution shall depend on the initialisation parameters of the weights [HERT_91].
Other training algorithms of the perceptron [FREU_98] have been developed, both based on the
uses of a kernel for the projection of data in other dimensional spaces with greater separability, as
well as based on changes in the learning algorithm that allow the convergence of the perceptron
in the hyperplane that maximises the margin of separation between classes.
If one observes the classification results of the classic perceptron against test classes, one can
observe that there is not a hyperplane (line) capable of discriminating between both classes, given
that they are not linearly separable. In this case, the training of the perceptron calculates that
hyperplane that minimises the error function.
Figure 3.31. Representation of the classification map of the perceptron a) 0 iterations, b) one
iteration, c) ten iterations.
2.3.2. Multilayer perceptron
As previously seen, one of the main limitations of the perceptron is that it is only capable of
modelling decision maps that are linearly separable. In order to correct this limitation, the
multilayer perceptron has two main peculiarities:
Chapter III Classification methods
Page 75
• this network has a hidden intermediate layer which makes it possible to model as
complex regions of decision as desired, adding new neurons in this intermediate layer, or
even, adding new hidden layers,
• On the other hand, the step function (3.51) that is implemented in the classic perceptron
is substituted by continuous and differentiability functions implemented in each of the
neurons. In this manner, the adjustment of the weights of the neural network during the
training phase is made possible via methods based on the gradient descent.
Fig. 3.32. Architecture of a multilayer perceptron or a backpropagation network.
Analogously to the classic perceptron, the output of each of the neurons of the intermediate layer
is defined as:
= ∑
=
M
j
jii wfa1
1 )(),( xWx (3.54)
Where fi is each of the activation functions in the intermediate layer and wj each of the j weights
that connect with the neuron i belonging to the first layer.
Similarly, the output of the intermediate layer that is projected towards the output layer is defined
as (3.55):
X1
X2
f1(x)
XN
f2(x)
fm(x)
f1(x)
f1(x)
Y1
Y2
W1 W2
Chapter III Classification methods
Page 76
= ∑
=
M
j
jii wfy1
2 )(),( aWa (3.55)
Where f i is each activation function in the last layer and wj each of the j weights that connect with
neuron i belonging to the transition between the intermediate and final layer.
Minsky and Papera [MINSK_69] verified that a two layer network could overcome many of the
restrictions of the multilayer perceptron, however, they did not provide any solution to the
problem of adjusting weights through hidden layers of the network in order to minimise its final
error.
The solution to this problem did not come until the mid-80's when Rumelhart [RUME_86]
offered a solution to this problem. The main idea of this method consists in propagating the error
obtained in the final layer of the network through hidden intermediate layers towards the input
layer. In this manner, the weights of all layers are adjusted during the learning phase. Different
solutions to this learning method have been implemented in order to accelerate and strengthen the
convergence of the network, from classical solutions based on the gradient descent to other more
advanced that use the information of the second derivative and of the Hessian to accelerate
convergence [SNYM_05]. Other novel methods to train a multilayer perceptron are detailed in
[BISH_08].
The theory of universal approximation for neural networks shows that every continuous function
that relates a set of real numbers in a given interval with a continuous output value in a number
interval of real numbers can be approximated to a multilayer perceptron composed by a single
hidden layer. It has a precision which grows with the number of neurons that exist in that
intermediate layer [BAKE_98] for certain types of activation functions, thus making the
multilayer perceptron into a universal approximator.
Given the increase in complexity of the classifier and in the number of parameters that define it,
there could be a problem of over-training of the classifier due to not having the necessary number
of learning elements. Because of this, regularisation methods have been created that avoid the
over-training of the system. Some of these methods are implicit in the network itself, such as
reducing the complexity of the network, a slow training, as well as reducing the number of
iterations. However, other methods include the adding of noise and the decrease in weights, as in
Chapter III Classification methods
Page 77
this manner one avoids, on the one hand, falling into local minimums, while on the other,
reducing the module of the weights, which is associated to over-training phenomena [BISH_95],
[CHENG_94], [RIPL_96].
One of the main advantages of the multilayer perceptors lies in that if they are correctly designed,
they provide an estimate of the reliability of the undertaken classification, being able to discard
those cases that are doubtful or on the limit [JAIN_00].
If one observes the behaviour of this classifier, one can notice the decision maps calculated for
different numbers of neurons in the intermediate layers. One can see that the estimates of the
decision maps are more similar to those obtained by an optimal classifier (Fig. 3.8) when the
number of neurons in the intermediate layer is greater. However, one can begin to notice the
effect of over-training when having a high number of neurons.
Figure 3.33. Representation of the classification map of the multilayer perceptron a) two
intermediate neurons, b) ten intermediate neurons, c) thirty intermediate neurons.
In the same manner, taking advantage of the capacity of the multilayer perceptron as a universal
approximator, one can classify the set of data defined in figure 3.22, obtaining the following
results:
Figure 3.34. Representation of the classification map of the multilayer perceptron a) four
intermediate neurons, b) ten intermediate neurons, c) thirty intermediate neurons.
Chapter III Classification methods
Page 78
In this case, if one compares it with the classification map of the optimal classifier (Fig. 3.25),
one notices that a reduced number of intermediate neurons does not model the real probability
density function well. However, for a high number of neurons, given the small number of
elements used for training, an over-training occurs due to the fact that one does not have the
necessary examples for the correct adjustment of the classifier. The calculated decision map does
not correspond with the optimal classifier in areas where there is no presence of samples.
However, it achieves a very good classification of training samples.
2.4. Combination of classifiers
At times, it is necessary or advisable to use data from diverse classifiers. Some of the reasons that
might lead to the combination of the information from several of these classifiers in order to
achieve greater precision from the classifier are listed in [JAIN_00].
− The same problem can be tackled based on different classifiers, each one representing the
same problem through a totally different representation of it. This could be the case in the
identification of persons using a combination of classifiers that combine different sources
of biometric information.
− Where the available data samples are obtained in different conditions or they can not be
taken simultaneously or even where there are different variables that do not train a joint
classifier by combining both data types at the same time.
− Classifiers using the same input data and obtaining a similar classification performance
do not necessarily have to model data in the same manner, modelling each of the classes
in a totally different manner.
In summary, one can have available different groups of features, training groups, different
classification methods and even different training methods, whose outputs can be combined in a
classifier which optimises the global performance of the classification.
There are two main ways of combining several classifiers:
− Parallel.
− Series or cascade.
Chapter III Classification methods
Page 79
Those techniques based on parallel processing assume that all the classifiers act at the same level,
being combined at a later time by a combining classifier that can even modify the weights of each
classifier.
Cascade based techniques execute the classifiers in a sequential manner, in such a way that as
they advance, the classifiers differentiate between a smaller number of classes, therefore being
much more specific.
A novel technique insofar as the combination of classifiers is that based on boosting FREU_96],
[SCHA_90]. This technique is based on an adequate combination of weak classifiers, that is,
classifiers which do not obtain a result that is very different to that which would be obtained
through a random choice of classification. When they are combined, this obtains a very high
performance in the classification. The advantages of these methods lie in that, as they are based in
a very high number of weak classifiers, the effect produced by the noise becomes less, as it only
affects a certain number of weak classifiers and does not affect the final precision of the system.
As far as combinators, there are basically two types:
• Those considered static, in which little or no training is necessary [TRES_95], such as
those based on mode, mean, median, sum, weighted sum...
• Other type of combinators that require training in order to tune the relation between the
different classifiers and which allow a better performance [JACO_91]. The design of
these adaptive combinators follows the same principles as a normal classifier, as stated in
this chapter.
3. Conclusions
This chapter has shown different metrics that measure the difference that exists between two
given spectra in a quantitative manner. The issue of the modelling of the different classes from a
different perspective is necessary due to the inability of these measures to be able to adapt to the
changes that these spectra can undergo due to the lighting conditions, geometry of the object
Chapter III Classification methods
Page 80
which they represent, small chemical changes in the materials or the variability of the class which
they represent.
Additionally, in this chapter a description has been provided on the mathematical basis and the
principles of different mathematical classifiers as an alternative to traditional metrics in order to
efficiently model the variability and discriminating features of the spectra. The use of these
classifiers is going to allow the creation of a suitable model that represents the inherent properties
of the materials to be classified, thus increasing their separability with elements that belong to
other classes and decreasing the separability between elements of the same class.
The Gaussian classifier significantly stands apart from between all those classifiers that are
presented in this chapter. Its simplicity and the possibility to know its precise mathematical
parametrising estimates the overlapping between different classes, obtains a good generalisation
and has available the necessary metrics to calculate the degree of similarity between the different
Gaussian models.
These properties are key when selecting the Gaussian classifier as the most adequate classifier, as
it validates the set of optimal descriptors and the methodology that maximises the classification.
Page 81
Chapter IV
Feature extraction in hyperspectral vectors
Chapter IV Feature extraction in hyperspectral vectors
Page 82
One of the main problems that must be faced by the the classification methods mentioned in the
previous chapter is related to the high dimensionality inherent to hyperspectral data when used for
classification tasks [FEATH_05, KUAN_05, PERK_05]. In order to solve this problem, different
techniques are used that reduce the number of features employed as inputs to these classifiers,
reducing to a great extent the Hughes Phenomenon [MANO_04]. In order to do so, the
information contained in spectral bands must be decorrelated using traditional methods such as
the Principal Component Analysis (PCA) or others [FEAT_05, TATZ_05, RAJP_03,
WANG_06], or through the selection of the bands that better discriminate the elements to be
classified [WILL_04, MERC_02, RELL_02, RAMA_05].
Methods based on the decorrelation of the spectrum and feature extraction try to achieve the
greatest reduction in the feature space through mathematical transformations. These
transformations, at times orthogonal, cause these transformed vectors to not be physically
plausible, or that, even when having an associated physical meaning, to be hardly capable of
being interpreted.
In order to avoid this problem, other methodologies based on the selection of features try to select
the most relevant of the spectrum without affecting their interpretability. Within this group, there
are some algorithms based on expert systems, which find and characterise different absorption
bands and which differentiate diverse materials based on previously established tables.
Nonetheless, this has the disadvantage that in order to include new materials, these must be
manually tabulated. Therefore, other methodologies select the subset of features that maximise
the classification or the separability of the sample.
This chapter shows the different current techniques that extract the adequate features for a
correct spectral characterisation. Although the complete feature vector of the feature spectrum
can be used to model a classifier without the need of a previous process or feature extraction, it is
not the most efficient manner to do so.
In fact, the use of a high number of components and complex classifiers for the resolution of a
classification problem generates inferior results to those obtained via the use of an adequate
number of variables [PAI_07].
Chapter IV Feature extraction in hyperspectral vectors
Page 83
This is due, on the one hand, to the fact that, generally, it creates an exponential growth of the
complexity of the distribution of the data in the space of high dimensionality. This makes the
function, which is the objective to be modelled, bear a greater complexity.
On the other hand, due to the increase in data dimensionality, there is an exponential increase of
the number of necessary samples in order to maintain the density of samples in the dimension of
the chosen space, a necessary parameter in order to achieve a correct estimate of the classifier's
parameters.
Hughes [HUGH_68] explains this phenomenon when comparing the existing relation between the
expected success of a classifier with its own complexity and with that of the number of samples
used for its training.
Fig. 4.1. Evolution of the performance of the classifier based on the number of features
In this way, a greater number of samples are necessary in order to obtain a correct classification
as the complexity of the classifier and the dimensionality of feature space increases. This
exponential growth in the number of necessary features to maintain the density of samples within
the space is known as the "Curse of Dimensionality” [BELL_61], and is the main cause for the
Hughes Phenomenon.
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
2 3 5 7 10 20 30 40
Number of Components
Classification ratio
Chapter IV Feature extraction in hyperspectral vectors
Page 84
In this manner, for a certain number of samples, the performance of a classifier increases at first
with the increase in features. Later, it decreases abruptly if its number is high due to not being
able to correctly estimate the probability distribution for the given dimension of feature space and
the number of samples.
On the other hand, spaces of high dimension have peculiarities that are difficult to sense at first,
in this manner, the volume of a hypercube is concentrated in its extremes as its dimension
increases. This can be demonstrated by studying the ratio between the volume of a hypercube of
side L versus side L – ε. Figure 4.2 shows that this fraction (4.2) tends towards one when the
dimension of the feature space tends towards the infinite.
d
Hipercube LV = (4.1)
d
dd
L
LLRatio
)( ε−−= (4.2)
Fig. 4.2. Volume ratio between two hypercubes of length L and L- ε in relation to their
dimensions.
Chapter IV Feature extraction in hyperspectral vectors
Page 85
This indicates that, for high dimensions, the greater part of the volume is concentrated on the
surface of the hypercube. Due to these properties, a space of high dimension tends to be empty
inside and is therefore susceptible of being represented by a space of smaller dimension. The
effect of the curse of dimensionality is limited through a correct reduction of features, thus
reducing the number of representative samples to train the classifier. Furthermore, its complexity
is simplified thus deriving in a greater efficiency in the generalisation and the computational cost
of the process is optimised by using a smaller number of features.
Therefore, a reduction in the number of highly correlated features that are present in
hyperspectral data is necessary. Along this line, there are two main methodologies to reduce the
set of features of a set of multidimensional data and to find a subspace of features that allows for
their better separability:
− Feature extraction: Extracting new features from the combination of existing features.
− Feature selection: Selecting those features that appear as most adequate for the tasks of
classification without modifying them.
1. Feature extraction
The process of feature extraction generally consists in a process which projects the data contained
in the original feature space (featured in the spectral vector L) over another feature subspace of
equal or smaller dimension.
Independent of the transformation method, the transformed subspace must be capable of
maintaining the separability of the classes, reduce the noise, or in certain cases and depending of
the needs, compress data keeping in the reduced space the majority of the information contained
in the original space.
Most of the techniques for feature extraction applied to hyperspectral data are based on the
premise that the observed spectrum is a consequence of the sum of several underlying physical
processes caused by the set of particles which shape the object. With this premise, the observed
spectrum can be represented as the weighted sum of the pattern spectra of each of the spectra
associated to these underlying physical processes.
Chapter IV Feature extraction in hyperspectral vectors
Page 86
Generically, one can express this linear transformation through a matrix A of [MxN] dimensions
when knowing the vectors that are going to make up the axes of the reduced subspace. Each
column n contains each of the N vectors that are the basis of the transformation system. These
vectors are going to be the basis of the representation of the system.
LL t
dtransforme
rrA= (4.3)
A compression in the transformation occurs if the number of base vectors in matrix A is less than
the original number of components. This compression must optimise some of the previously
mentioned criteria: separability, reduction of noise or compression.
Generally speaking, these base spectra, which are going to be used to represent the original
spectrum, are unknown both in number as well as in value. As mentioned earlier, the objective of
this transformation is to achieve a representation of the information contained in the higher
dimension space, in a subspace of smaller dimensionality, in a manner that it maintains or
increases the separability of classes, or that it maintains the maximum possible information of the
signal.
The most common approach to the problem of the feature extraction based on linear
transformation is PCA or Principal Component Analysis [HOTT_33]. This technique, also known
as the Karhunen-Loève transform, has been widely used in applications for dimensionality
reduction, compression with losses, feature extraction and visualisation, among others.
This method consists in an orthogonal transformation that calculates a new sequence of
uncorrelated vectors known as principal components. This transformation diagonalises the
covariance matrix in such a way that each of the obtained features does not bear a correlation
with the rest of the features in the transformed space.
This ,to a certain extent, estimates those components that are not influenced by any other and
that represent an adequate estimate of those principal vectors from which the observed spectra are
formed.
Chapter IV Feature extraction in hyperspectral vectors
Page 87
The problem of principal components can be considered via different equivalent approaches, all
reaching the same mathematical formulation [GOME_02]:
− Search for an orthogonal transformation which provides a set of variables maximising the
variance of the transformed sample (Formulation of maximum variance).
− Search for an orthogonal transformation that provides a set of uncorrelated variables.
− Search for a straight line for which the square sum of the perpendicular distances to the
data is minimal (Formulation of minimum error).
Let us consider L as a vector of D components represented in the non-transformed space, and
Ln a set of L spectra taken as a sample.
The aim is to project the data contained in the original space of D dimension, into a subspace of
M dimension in such a way that the variance in the transformed space is maximised.
∑=
=N
n
nN 1
1LL (4.4)
Where L is the mean of the set of spectra Ln, and 1u the unit vector that defines the direction
of projection, one can define the variance of the data sample over 1u as:
1T1
1
T1
T1 ˆ··ˆ·ˆ·ˆ
1uSuLuLu =−∑
=
N
n
nN
(4.5)
Where S is the covariance matrix of the set of sample data defined by:
∑=
−−=N
n
T
NS
1
))((1
LLLL nn (4.6)
If one maximises the projected variance (4.5) in regards to vector 1u , with the restriction that the
module of that vector is one (4.7) (in order to avoid equivalent solutions with different modules),
Chapter IV Feature extraction in hyperspectral vectors
Page 88
1ˆ·ˆ 11 =uuT
(4.7)
Maximising this term based on Lagrange multipliers, one has the following target function:
f = )ˆ·ˆ1(ˆ··ˆ 1T111
T1 uuuSu −+ λ (4.8)
Differentiating with regard to 1u and equalling to zero the target function, one finds that the
maximum is obtained for:
111 ˆˆ· uuS λ= (4.9)
111 ˆ··ˆ λ=uSuT (4.10)
In this way, the variance shall be maximum when 1u is equal to its eigenvector associated to its
highest eigenvalue of the covariance matrix. Defining successive directions that maximise the
variance, taking into account orthogonal directions to those previously defined, one obtains a set
of eigenvectors Muuuu ˆ,...ˆ,ˆ,ˆ 321 associated to the first M eigenvalues of the covariance matrix
in descending order Mλλλλ ,...,, 321 . The transformation to this new feature space is given by:
[ ] )·(ˆ,...ˆ,ˆ,ˆ T321 LLuuuuLVL −== M
t
transfomed (4.11)
=
Mλ
λ
λ
...00
............
0...0
0...0
S 2
1
transfomed
(4.12)
It is noteworthy that the covariance matrix in the transformed space is defined by the diagonal
matrix created by the different eigenvalues, which, besides maximising the variance between
data, has achieved the extraction of a set of independent features.
Chapter IV Feature extraction in hyperspectral vectors
Page 89
On the other hand, the λ variance associated with each eigenvector u , defines the quantity of
information contained by this vector. As a general rule, the eigenvectors associated to higher
values of λ are associated with discriminating features and the eigenvectors of lesser variance are
associated with noise.
A criteria when selecting the number of eigenvectors that are going to be part of the reduced
space is that of maintaining the M<D eigenvectors that are associated to the N higher
eigenvalues. The percentage of information contained in those eigenvectors is defined
throughτ as:
τλλ =∑∑==
D
i
i
M
i
i
11
(4.13)
Therefore, the first components of the transformed space represent the greater part of the
variability of the system, thus being able to reconstruct, in a precise manner, each original vector
dispensing with the components associated to the eigenvectors which contain less information.
However, the analysis of the principal components has several limitations [CHERI_03]. The first
lies in that this method is oriented towards feature reduction in order to maximise the variance of
the transformed data thereby aligning the transformed axes for its maximisation. At times, these
axes do not have the discriminating information between classes. That is, PCA maximices the
variance rather than the separability.
Chapter IV Feature extraction in hyperspectral vectors
Page 90
Fig. 4.3 Axis defined by the principal eigenvector calculated via the PCA method.
For certain distributions, such as that seen in figure 4.3, the transformation made by the PCA does
not maximise the maximum separability between the two classes present.
In contrast with the Principal Component Analysis, the Linear Discriminant Analysis, also known
as Canonical Analysis or Fisher's discriminant [BISH_07], creates a set of transformation vectors
which maximise this separability.
For this, the measure of separation between classes is defined as the distance between means,
corrected with the value of the variances of the classes in the transformed space. In this way, and
intuitively, one can observe that the separability of classes shall be greater as there is more
separation between the means of the classes, weighted by the inverse value of the variances of the
classes in that transformed space (4.14).
( )22
21
212)(ss
mmJ
+
−=w (4.14)
Where 2ks is the variance calculated for each of the classes in the transformed space:
Class 1
Class 2
X1
X2 Principal Eigenvector
Chapter IV Feature extraction in hyperspectral vectors
Page 91
∑∈
−=kCn
knk mys 22 )( (4.15)
And w the transformation vector that converts original vectors into transformed vectors yn .
n
T
n xy ·w= (4.16)
Therefore, the aim is to find the w transformation vector that maximises )(wJ , and with this,
also maximise the separability between classes.
Re-writing )(wJ based on w, there remains:
wSw
wSww
··
··)(
w
BT
T
J = (4.17)
Where BS is the interclass covariance matrix as defined by:
T))·(( 1212B mmmmS −−= (4.18)
And wS is the total intra-class covariance matrix, as defined by the sum of the covariance
matrices of each of the classes:
∑ ∑= ∈
−−=
2
1w ))·((
k Cn
T
knkn
k
mymyS (4.19)
Differentiating )(wJ regarding w, one obtains that )(wJ is maximised when:
wSwSwwSwSw ·)···(·)···( BwwBTT = (4.20)
Chapter IV Feature extraction in hyperspectral vectors
Page 92
Fig. 4.4. Axis defined by the vector w obtained via the LDA method.
This method can be generalised for K>2 classes. In this generalisation, two new concepts are
included. Instead of the variance of each class, a mixture of dispersion matrices within the classes
is used, and instead of using the separation of means, a dispersion matrix between classes, that
takes into account the dispersion of all the classes between them, is used, thus Sb and Sw remain
in the following manner:
∑=
−−=K
k
T
kkkN1
B ))·(( mmmmS (4.21)
∑ ∑= ∈
−−=
K
k Cn
T
knkn
k
mymy1
w ))·((S (4.22)
From these expressions, a scalar is constructed that grows when the inter-class covariance is high,
and when the intra-class covariance is low, in a manner that we obtain a maximisation of the
separability of the classes. There are numerous approaches to obtain these properties, an example,
proposed by Fukunaga [FUKU_90].
B1 ·)( SSw−= wTrJ (4.23)
Class 1
Class 2
X1
X2
w
Chapter IV Feature extraction in hyperspectral vectors
Page 93
Expressing )(wJ as a function of the projection matrix w defined by the direction vectors that
define the transformation, one would have:
( ) ( ) T
B
T
wTrJ wSwwSww ·····)(1−
= (4.24)
The transformation matrix w can be calculated by resolving the generalised problem of
eigenvalues and eigenvectors. The vector obtained in this manner maximises the expression
)(wJ obtaining a matrix w, which has K-1 non-orthogonal vectors that optimise the separability
between classes.
Through this method, one can extract features that efficiently discriminate between different
classes. On the contrary, one can only extract K-1 features, being K the number of classes.
Although these features may be optimal, if one wants to later use a linear classifier, they do not
extract additional features that could be discriminating when using other types of classifiers.
Another additional problem lies in the high number of samples necessary to correctly calculate
the intra and inter-class covariance matrices, Sw and Sb, respectively. Different from the PCA
case, it is necessary to estimate the covariance matrices for each of the classes, therefore, having
to have a high number of samples available for each of the existing classes.
The subspace calculated by these methods does not allow the linear separation of certain data
distributions, as for example, those whose shape is clearly nonlinear or those who have similar
means. Figure 4.5 shows cases in which the axes obtained through LDA do not allow an optimal
linear discrimination of the classes. A detailed description of the limitations of LDA in the
domain of hyperspectral imaging can be found in [PRAS_07],
Chapter IV Feature extraction in hyperspectral vectors
Page 94
Fig. 4.5. Examples of LDA limitations
Other methods, such as the Projection Pursuit [FRIE_87] and the Independent Component
Analysis [COMO_94][BELL_95][DJOU_97] are appropriate for the feature extraction in non-
Gaussian data distributions. These techniques have been used in processes of blind extraction of
components.
Other methodologies extract nonlinear features from data. One of these methods that is directly
based on the Principal Component Analysis (PCA) is that known as Kernel PCA [HAYK_99]
[SCHO_98]. The basic idea of this method is to project the input data in a new space of features
F, modelled via a nonlinear function Ф (kernel), that is usually defined by a polynomial of order
p or by a Gaussian kernel. In this way, the eigenvector and the eigenvalues of the projection space
are calculated instead of those of the original space. The selection of the function that models the
kernel depends on the application and is still the object of study [JAIN_00].
Neural networks for one, integrate mechanisms for feature extraction and classification
[JAIN_00]. In fact, each of the outputs of the hidden layer of the network can be interpreted as a
set of new and, usually, nonlinear features. In this manner, multi-layer perceptrons can be used as
optimal feature extractors [LOWE_91]. The networks used in [FUKU_83] [CUN_89] are in fact
feature extraction filters in two-dimensional images, adjusted via the training of the network with
the existing data in order to maximise the classification.
The self-organising maps, or Kohonen maps [KOHO_95] can also be used for the extraction of
the nonlinear features. The manner of functioning of these networks implies the presentation of
Class 1
Class 2
X1
X2 Class 1
Class 2
X2
X1
Chapter IV Feature extraction in hyperspectral vectors
Page 95
different patterns to it, in such a way that the network adapts to the patterns presented and
generates a set of characteristic nodes or vectors.
Through this learning, data can be categorised (clustered) and in this manner, one can
automatically generate data clusters that are capable of quantizing the input vectors. After
training, the weights of each of the network neurons tend to represent those input patterns that are
near in the original feature space.
2. Feature selection
One of the main problems of the transformation of features lies in that the techniques employed
for its obtainment transform them in such a way that they hide the physical meaning of the
different variables.
The techniques based on feature selection reduce the existing redundancy in the original data
without modifying the physical meaning of the variables. On the other hand, it reduces the
amount of data that is going to be used by the system, thus considerably reducing the
computational cost of the process, the acquisition time, as well as the cost of data storage or the
price of the sensor to be used.
By maintaining the physical meaning of the variables, the later extraction of knowledge from the
classifier is made possible. This converts the knowledge extracted by the classifier into useful
information for increasing the knowledge of the problem at hand or to generate, based on the
obtained results by the classifier, logical and comprehensible rules.
The techniques for feature selection can be divided into two groups: those which make the
selection by estimating the variables which bear greater advantages in the classification and those
which use previous knowledge to extract the most relevant features.
Chapter IV Feature extraction in hyperspectral vectors
Page 96
2.1. Automatic feature selection.
These methods try to select those spectral bands which maximise previously defined criteria.
Generally, these criteria are based on either of these two premises:
− Increase in the classification rate.
− Obtaining of a greater separability between classes.
In this manner, this allows for the selection of that subset of variables that most efficiently
improves the separability between classes or that has the greatest increases in the rate of
classification. The obtaining of this optimal subgroup of features allows the later extraction of
information dealing with the importance of the extracted variables in relation to the problem to be
resolved. This infers new knowledge on the causes that can create a certain reaction.
Given the great quantity of existing variables, it is not feasible to search for that combination that
maximises the performance in the classification. For this reason, the comprehensive and
systematic search methods that analyse each and every one of the possible feature subgroups are
inappropriate for this task [GOME_02]. The search cost is enormous, even for a small number of
features [COVE_77].
Diverse search techniques are used in order to reduce the high computational cost that allow to
localise the subset of variables that optimises the chosen criterion through non-exhaustive search
techniques (exponential, sequential and random algorithms). However, the only alternative search
method that continues to be optimal and which eliminates the need to undertake a comprehensive
search is that known as branch and bound [NARE_77], always when the criterion function to be
maximised is increasing and monotonous, that is, it must always grow when a new feature is
added.
Other methodologies do not find the optimal subgroup, however, increase the calculation speed
of a subgroup, which is important in the applications with a great number of features [JAIN_97].
The fact that a feature, in an individual manner, obtains the best classification ratios [COVE_74]
does not mean that the optimal group of features has to include those features that obtain the best
individual results.
Chapter IV Feature extraction in hyperspectral vectors
Page 97
For this, methods are needed that take into account the dependencies between the different
variables. Different algorithms are proposed for this:
− Forward selection: In a sequential manner, starting from an empty set, those features that
most increase the target function are added, having the limitation of not being able to
eliminate those variables that became obsolete after the addition of new sets of features.
− Backward selection: In a sequential manner, starting from the complete set of features,
progressively eliminating those that cause the biggest decrease in the target function. The
limitations of this method, analogous to the previous one, lie in the impossibility to return
to include those previously eliminated variables.
Other more sophisticated techniques include algorithmic improvements that combine both
aforementioned methods in order to eliminate their limitations. These methods add previously
eliminated variables again or to eliminate those variables that were previously added that do not
improve the current subgroup of features [PUDI_94].
− Selection plus l minus r: This technique sequentially adds l features in order to later
eliminate r features. Analogously, one can begin with the complete set of features and
eliminate l features in order to later add r. A drawback of this method is that the previous
calculation of r and l is necessary.
− Floating search methods: Extension of the earlier method that eliminates the limitation of
having to choose the values of r and l. This method, for each added feature, eliminates as
many features as necessary as long as this elimination improves the target function.
Similarly, starting with a whole set of features, as many features as necessary are added
for each feature eliminated so long as the classification improves.
As a general conclusion, one can state that this last method is almost as effective as the branch
and bound classification method while requiring a smaller computational cost [JAIN_00].
Irrespective of the chosen search criterion, one must quantitavely evaluate the improvement
caused by using the chosen subset of features.
Chapter IV Feature extraction in hyperspectral vectors
Page 98
Observing criteria based on the improvement of the classification rate, a classifier is built with the
subset of chosen features. The effect of this reduction of features is evaluated in relation with
other subsets, in order to select the subset which obtains the better rate of correct answers.
The methods based on criteria of separability presuppose a statistical model in which the different
classes adjust to each of the feature subsets. Once these models are created, a calculation is made
of the statistical separation between the classes which compose it. As an advantage, these
methods have a smaller computational cost, but require a correct estimation of the distribution or
statistical model of the data in order to obtain efficient results.
2.2. Selection and extraction of known discriminant features.
Within this group, one can find those techniques based on the selection of known spectral
features, either caused by some type of known physical phenomenon or because they constitute a
discriminat feature and tabulated for certain types of materials (e.g. absorption bands of C-H
bond, bands of chlorophyll absorption...).
The specific absorption in a band (or neighbourhood of bands, where the absorption is produced
in several consecutive bands), can be caused by the presence of certain chemical elements, ions,
their ionic charges, or partly, by the crystalline structure of the components.
The algorithms shown in this section use the information of these absorption phenomena and their
causes to directly extract those characteristics that are most suitable to solve a specific problem.
It should be noted that these methods do not correspond with a pure feature selection, as these are
not directly selected, but rather are extracted using previous empirical knowledge that entail a
previous transformation of the spectrum for its obtaining.
There are different approaches to these methods, some of them reduce the spectrum to a certain
number of features, every one representing each of the chemical components present in the
spectra to be analysed [WIN_07]. In this manner, a spectral image that contains elements made
up of a mixture of three pure polymers shall be reduced to three components, each containing the
signature of each of the polymers.
Chapter IV Feature extraction in hyperspectral vectors
Page 99
When no information is available on the pure variables, then information is extracted from the
second derivative of the reflectance spectrum. This provides information on the different
absorptions that are produced throughout the spectrum, obtaining the absorption signatures
associated to it.
Different absorption spectra from different materials have been tabulated in different data bases
[ASTER_98] [CLARK_93] that include the model reflectance spectra for different materials in a
pure chemical state. In this manner, information is available on the absorption bands of the
different chemical components that are of interest when classifying a material.
Fig. 4.6. Characterisation of an absorption band
Though this approach requires detailed knowledge of the elements to be analysed, it makes
possible the selection of those features which are most suited in order to resolve a specific
problema [CLARK_95]. In this manner, one can manually select those features which best
discriminate between the elements to be classified, bearing in mind the previous existing
knowledge. Knowing the characterisation of these absorptions through diverse parameters
(wavelength, absorption intensity, thickness of absorption), one can detect its presence in the
spectrum and use that descriptive characteristic of the material to be classified.
However, although these methodologies do not destroy, but rather foster the physical
interpretation of the spectral features, they require previous knowledge of the spectra to be
analysed. Therefore, they can not be used as blind classifiers that establish which are the
d
h
λ
Reflectivity
Wavelength
Chapter IV Feature extraction in hyperspectral vectors
Page 100
important characteristics or as classifiers which can be used to extract a later knowledge based on
additional information provided by the classifier.
3. Conclusions
The use of statistical classifiers or other types of classifiers to model the classes defined by these
spectra can not be undertaken directly due to both the great quantity of training data necessary
and the high complexity of the necessary classifier because of the high dimensionality of the
input data.
This chapter has mentioned classical methods of feature extraction such as those based on the
Principal Component Analysis and the Linear Discriminant Analysis, observing some of their
limitations and stressing that this methods cast a shadow on the information contained in the
transformed variables, not allowing for an easy analysis of the obtained results.
The advantages of the methods of automatic feature selection have been highlighted, in so far as
they select those variables which improve the classification and discrimination between classes
without casting a shadow on the physical meaning of the extracted features, also mentioning the
limitation that one can only select each of the bands separately.
Lastly, the advantages of the methods based on the previous knowledge of the spectrum have
been shown, in order to extract and model certain previously known absorption features.
However, these methods have the disadvantage of requiring a previous knowledge of the
materials to be classified, something not always available.
It is therefore necessary to have a feature extraction/selection method which does not blur the
physical meaning of the extracted variables and which automatically models and selects the
absorption features that are present in the spectra, thus allowing for its correct classification.
Page 101
Chapter V
Extraction of spectral features based on fuzzy sets
bioinspired in the human visual system
Chapter V Extraction of spectral features based on fuzzy sets
Page 102
Earlier chapters have described the problem of the classification of luminous spectra that have a
high dimensionality. The processes of feature reduction are necessary in order to minimise these
problems related to the high complexity of the classifier and with the not always sufficient
number of training elements (Hughes Phenomenon).
For the case of hyperspectral images, feature extraction techniques efficiently extract, from a set
of data, new features that have a high discriminating power. However, these methods require a
previous training and the extracted discriminating features are dependent on the set of training
data.
Features that are extracted in this manner, despite their discriminating power, do not represent
specific physical variables and therefore can not be easily interpreted. Furthermore, these
features, as they are dependent on the set of training data, have variations when an addition of
any new element or class is made to the sample. This effect makes these variables not adequate
for cases in which an addition of new classes is going to be made or where these are going to
progressively change over time.
On the other hand, an automatic selection of features allows the selection of those variables which
provide best results in the classification without casting a shadow on the physical meaning of the
variables used. On the contrary, this methodology is also dependent on the classes to be modelled
and can eliminate variables, which although not having the discriminating power for the studied
classes, are crucial in the discrimination when new classes are added.
Methods based on the previous knowledge of the classes to be analysed can localise and quantify
previously known absorption bands that identify those classes. However, these methods bear the
same disadvantages as the earlier ones, as they require the previous knowledge of the classes to
be modelled. Furthermore, the incorporation of new classes to the system creates the possibility
that those detected and modelled absorption bands may not be suitable for their discrimination.
Given these limitations, it is necessary to choose a set of characteristics that comply with the
following conditions:
− That they reduce the original feature space.
− That they have an adequate discriminating power.
Chapter V Extraction of spectral features based on fuzzy sets
Page 103
− That they do not depend on a previous training, so that they do not vary in their definition
when adding new classes to the system or when modifying the existing classes.
− That their discriminating power is based on a physical basis.
− That they maintain the physical meaning of the variables.
− That they include the advantages of the methods based on the localisation of absorption
bands without requiring a previous knowledge of the classes to be modelled.
− That they be generic, without depending on either the type of application or the set of
training.
In this chapter, a definition is provided and a proposal is made for the adaptation of the theory of
fuzzy sets as proposed by Zadeh in the formulation of his theory of fuzzy logic [ZADE_65] for
the extraction of discriminating features in hyperspectral pixels.
This proposed adaptation has a behaviour similar to the feature extraction process made by the
human eye. These defined fuzzy sets would correspond to a "virtual cone", sensitive to a specific
area of the luminous spectrum in a manner similar to the human eye, taking advantage of the
existing correlation between near spectral bands.
1. Definition of fuzzy sets
The fuzzy sets defined by Zadeh [ZADE_65] offers an extension of the classical definition of a
set. In the classical definition of a set, an element can either belong or not belong to a specific set.
In the extension of the fuzzy theory of sets, the concept of grade of membership to a specific set
is added. In this manner, the issue is not whether an element either belongs or does not belong to
a set, but rather expresses the grade of membership to a set in the interval [0,1], the unit
indicating full membership to the set and the zero value its non-membership.
In this manner, one can define fuzzy features that are capable of expressing concepts which are
not correctly expressed through the classical definition of sets, such as "slow", "agile", "quick",
"young"...
Chapter V Extraction of spectral features based on fuzzy sets
Page 104
Fig. 5.1. Separation of the different stages of life through classical sets.
If one needs to express concepts such as child, adult or elder based on age, and these are defined
through classical sets, as shown in figure 5.1, one notices that there are no differences between
elements of the same group, classifying the month-old newborn in the same manner as an
eighteen-year old. Additionally, it considers a person of seventy-five to have the same elderly age
as that of ninety-five year old. Likewise, elements situated in the border between two groups
could change their membership in a brusque manner without changing their defining features (in
this case, age) in a considerable manner.
However, the use of fuzzy sets to define these concepts takes into account the grades of
membership of each of the sets. In this manner, figure 5.2 shows the grade of membership
through triangular functions. These membership functions represent the grade of membership to
each of the sets. In this manner, a ninety-year old person shall have a greater grade of
membership to the concept "elder" than one of sixty years.
Fig. 5.2. Separation of the different life stages through fuzzy sets.
Age
Grade of membership
Child
Adult
Elder
Child Adult Elder Age
Age
Grade of membership
Child
Adult
Elder
Child Adult Elder Age
Chapter V Extraction of spectral features based on fuzzy sets
Page 105
In the same manner, one can define membership functions with different geometries, so as to
represent the desired concept in an adequate manner. Examples of these types of functions are
Gaussian, trapezoidal...
2. Spectral fuzzy sets
The theory of fuzzy sets can be applied to the extraction of discriminating features of
hyperspectral pixels. Based on the same philosophy as fuzzy sets, one can divide a spectrum into
a specific number of sets so that each of these sets represents a specific range of the spectrum.
Assuming that the elements to be measured are not in an excited gaseous state, which would
cause the emission or absorption in very specific bands of the spectrum, one can state that there is
a high correlation between adjacent bands. This correlation entails that the absorption bands that
define the materials are not going to be solely represented by a single band, but rather they are
going to be defined by a set of bands which are adjacent between them.
In order to model this correlation, one must achieve the association of each of the wavelengths of
the spectrum with those of neighbour wavelengths in order to take into account the values of the
adjacent frequencies. A way to do this is to associate each of the elements of the spectrum to the
diverse ranges in which the spectrum can be divided. In order to avoid that close elements could
belong to two different groups, a proposal is made so that the different sets are defined in a fuzzy
manner, as described in the following.
Where L is the vectorial representation of a spectrum defined by the intensity response of M
wavelengths, as shown in figure 5.3.
TMLLL ,...,, 21=L (5.1)
Chapter V Extraction of spectral features based on fuzzy sets
Page 106
Fig. 5.3. Graphical representation of a hyperspectral pixel based on its wavelength.
The different ranges of this spectrum are represented by K-fuzzy sets. In this way, each
wavelength of the spectrum has a specific grade of membership to one or several fuzzy sets.
Fig. 5.4. Representation of the different fuzzy sets into which the spectrum is divided.
Figure 5.4 shows the division of the spectrum into K-fuzzy sets, showing the grade of
membership of each of them through their respective K-triangular functions of membership.
Although for simplicity's sake one assumes triangular functions of membership, these can have
other types of shapes, though the triangular and Gaussian are the most common. However, in a
generic manner, a spectrum can be represented via any type of shape.
Observing the sensitivity of the cones of the human visual system (see Chapter II), one notices
that this sensitivity can be approximated through the definition of Gaussian or triangular fuzzy
sets for the appropriate wavelengths, as shown in figure 5.5. In this manner, using three fuzzy sets
sensible to red, green and blue, one can emulate the way in which the human eye sees.
λ
Grade of membership
1
0
Mf1 Mf2 Mf3 … … … MfK-1 MfK
λ
Intensity
λi
Li
Chapter V Extraction of spectral features based on fuzzy sets
Page 107
Fig. 5.5. Absorption frequencies of the different types of cones present in the human visual
system and their comparison with the sensitivity of different triangular fuzzy sets.
The division of the spectrum into equally spaced triangular fuzzy sets, as shown in figure 5.6,
generically associates a wavelength with a specific region of the spectrum. In this manner, a sort
of multi-spectral eye is generated in which each defined fuzzy set has a similar sensitivity to that
which a human ocular cone sensitive to those wavelengths would have.
Fig. 5.6. Definition of fuzzy sets based on triangular shapes.
Figure 5.6 shows the division of a spectrum into K-fuzzy sets defined by triangular membership
functions Mfi. These functions are triangularly shaped in the proximity of its central wave λCi and
have null value in other areas of the spectrum as defined in equation (5.2). The length of the
central wave is λCi and D is the distance between two consecutive central frequencies.
S Cone
M Cone L Cone
Wavelength (nm)
400 500 600 700
Grade of membership
λ
Mf1 Mf2 Mf3 … … … MfK-1 MfK
λC1 λC2 λC3 … … … λC(K-1) λCK
Chapter V Extraction of spectral features based on fuzzy sets
Page 108
+<<−−
−=
Otherwise,0
λλλ,1)(
DDDMf CiCi
Ci
i
λλλ (5.2)
This way, each point in the spectrum has a grade of membership associated to each of the fuzzy
sets in which the spectrum is divided. In this manner, each λi element has a grade of membership
for each set, which is defined by the value of each Mfi membership function that is associated
with that set at that point.
By defining the membership functions in this manner, the value of intensity L(λi), which is
associated to wavelength λi, has a membership associated to it different from zero for the two
adjacent fuzzy sets and a grade of membership equal to zero in the rest, as shown in figure 5.7.
This way, one can establish that a certain wavelength belongs to each of the fuzzy sets of the
spectrum with a certain rank.
Fig. 5.7. Membership of the wavelength λi to the different fuzzy sets.
Depending on their size and position, the fuzzy sets so described define different parts of the
spectrum. Broad fuzzy sets (high D) can define the following spectral ranges: "visible", "near-
infrared", "mid-infrared",... while sets that have a smaller D parameter value define other more
specific ranges, such as: "oranges", "yellows", "violets",...depending on the wavelengths that they
λ
Grade of membership
λi λ
λi
Li
Intensity
Mf1 Mf2 Mf3 Mf4 Mf5 … MfK-1 MfK
Chapter V Extraction of spectral features based on fuzzy sets
Page 109
represent. When having available a hyperspectral pixel in the visible range, the selection of three
fuzzy sets to define it would be the equivalent of taking its RGB image.
Given the high correlation between adjacent wavelengths, one can represent the original spectrum
and describe it based on the behaviour of the different fuzzy sets.
In order to model this behaviour, one defines the Energy of each of the fuzzy sets in which the
spectrum is divided, weighting the intensity of each of the elements of L(λi) spectrum with the
membership function associated to each of the sets. This way, the Energy of each fuzzy set for a
given spectrum shall be defined by:
∫=
=
=M
ii dLMf
λ
λ
λλλ0
)·()·(E (5.3)
This Energy so defined, indicates the intensity of the spectrum in that fuzzy set in the same
manner as perceived by a human cone. Depending on its position and size, one can measure
concepts such as the intensity in the "visible", "near- ultraviolet" or the intensity in hues such as
"red", "violet"...
Earlier methodologies [CLARK_95] manually compiled and parametrised the absorptions that
allowed making a discrimination between the different materials (wavelength, intensity of
absorption, width of absorption) that were later tabulated into data bases [CLARK_93],
[ASTER_98]. These absorptions are related to the chemical composition of the material as they
respond to the frequency of vibration of its chemical bonds. These are considered as one of the
fundamental elements in the discrimination between different materials [WIN_07].
One of the main advantages of the proposed method over earlier ones lies in the capacity to
capture relevant information on the absorption bands that characterise a material in an automatic
manner and without the need to have previous knowledge about it.
The fact that the absorption phenomenon happens in consecutive wavelengths makes the analysis
of the Energy in the adequate fuzzy set provides us with information that a certain phenomenon
of absorption has happened. At the same time, it provides us with information as to its position
Chapter V Extraction of spectral features based on fuzzy sets
Page 110
(fuzzy set in which it happens) as well as the intensity of the absorption (area of absorption).This
Energy integrates the parameters of the absorption phenomenon within it by combining the levels
of absorption in each of its near wavelengths while considerably reducing the noise of the
parameters.
In this manner, the absorptions that are present in the spectrum are directly parametrised by the
Energy of the associated fuzzy set or sets and the Energy of the adjacent fuzzy sets.
Fig. 5.8. Calculation of the Energy of each of the fuzzy sets for a given spectrum.
This way, a hyperspectral pixel can be represented as the vector which contains the values of the
Energy of each of the fuzzy sets in which it is divided, thus reducing the space of features using K
elements, K being the number of fuzzy sets used.
TKTrans EEE ,...,, 21=L (5.4)
The representation of the spectrum based on the Energy of the different associated fuzzy spectra
characterises these absorptions in a more efficient manner than the earlier parametrising, as the
presence of a certain absorption is associated with the values of the Energy of the fuzzy sets. This
λ
Energy
Intensity
Ei E2 E3 … … … EK-1 EK
x x x x x x x
x
Chapter V Extraction of spectral features based on fuzzy sets
Page 111
creates a unique and universal feature vector that is capable of defining the absorptions present in
different materials without a need for either previous training or its modification. To a certain
extent, this approach is similar to that used by the cones of the human visual system by
associating certain wavelengths with a specific visual sensation.
The concept of the Energy of a certain range of the spectrum keeps the physical meaning of the
variables as it represents the answer of the spectrum in each of the different fuzzy sets that, in
turn, represent conceptual ranges of it. In this manner, a high Energy level for a fuzzy set in the
wavelength range of the colour red corresponds with the perception that the human type L cones
make of red.
On the other hand this methodology reduces the dimensionality of the feature space in an
efficient manner due to its basis on the physical properties of spectra, making use of its
discriminating capacity of the absorption bands and their existing correlation. In order to show
these properties, figure 5.9 shows, on the one hand, the raw representation of the spectra A1, A2,
B and C, while on the other, its representation based on fuzzy sets. Of these, spectra A1, A2
belong to a same class while spectrum B represents a different type of class which is
differentiated from class A. Spectrum C represents a spectrum which belongs to a class of
spectral properties similar to B.
Fig. 5.9. Spectral representation of materials A1, A2, B and C.
a) Raw representation, b) Representation based on fuzzy sets.
Wavelength
Intensity
A1
A2
C
B
Component
Intensity
A1
A2
C
B
Chapter V Extraction of spectral features based on fuzzy sets
Page 112
When observing (Table V.1) the differences obtained when comparing the representation based
on fuzzy sets of spectrum A1 with the representation of the rest of spectra through classic metrics
defined in equations (3.2), (3.3), (3.4), (3.6) and (3.9), one can notice that an increase in the
distance of separation between the different membership spectra in respect to the distance
obtained for spectra which belong to a same class.
In order to analyse whether we have an increase in the separability associated with the use of the
spectral representation based on fuzzy sets, a comparison of the distances of separation obtained
from the fuzzy representation (Table V.1) is made with the distances obtained in the raw
representation of the spectrum (Table III.1).
The results from this comparison are listed in table V.2 and show the increase in the separability
caused by the use of the fuzzy spectral representation, except for the SAM metric. This allows us
to state that, a priori, a positive effect in the separability of materials is caused by the use of this
bioinspired technique.
TABLE V.2 INCREASE OF THE SEPARABILITY IN THE REPRESENTATION BASED ON FUZYY SETS VERSUS THE SEPARABILITY
BASED ON RAW SPECTRUM
Raw Separability Separability based on
fuzzy sets Improvement in the
separability
City Block 144.40% 229.15% 1.59 Euclidean Distance 135.37% 229.70% 1.70 Tchebychev Distance 115.02% 212.57% 1.85
SAM 100.15% 100.34% 1.00
SID 182.83% 538.12% 2.94
TABLE V.1 COMPARSION OF THE DISTANCES OBTAINED FOR SPECTRUM A1 BASED ON CLASSICAL METRICS FROM THEIR
CHARACTERISATION BASED ON FUZZY SETS
A2 B C
City Block 0.10 0.48 0.22 Euclidean Distance 0.04 0.20 0.09 Tchebychev Distance 0.03 0.14 0.06
SAM 1.00 0.98 1.00
SID 0.00 0.14 0.03
Chapter V Extraction of spectral features based on fuzzy sets
Page 113
3. Multi-frequency spectral fuzzy sets
The representation of the spectrum based on fuzzy sets obtains information on the behaviour of
the spectrum in each of the ranges represented in these sets.
In most cases, the division of the spectrum in an adequate number of fuzzy sets will efficiently
reduce features. It adequately models the absorption bands present in the elements to be classified
and maintains a rate of classification that is similar to that obtained through Principal Component
Analysis (PCA), but without its disadvantages. This shall be shown in the chapter on results.
Other times, the case might be that the defined fuzzy sets are excessively large in order to model
small absorption bands that might be present in these materials.
In these cases, a proposal is made for an extension of the earlier method based in the multi-
frequency definition of fuzzy sets that define each of the ranges of the spectrum. For this, as in
the earlier method, triangular functions of membership are created. Unlike the previous, in this
case, N collections are created, each composed by a set of triangular membership functions
defined by a Dj spacing parameter that is associated to each of the collections.
In this manner, a membership function Mfij that is associated with a central length λi and a
spacing parameter Dj is provided by the following expression.
+<<−
−−
=
Otherwise,0
λλλ,1),( jCijjCij
j
Cij
ij
DDDMf
λλλ
(5.5)
This way, by defining the different values of Dj, one can create different collections of fuzzy sets
as shown in figure 5.10.
Chapter V Extraction of spectral features based on fuzzy sets
Page 114
Fig. 5.10 Different collections of fuzzy sets with different Dj spacing parameter.
Applying (5.3) to each of the collections, one extracts the Energy of each of the fuzzy sets present
in them. In this way, collections with a greater spacing parameter will capture absorptions of a
lower frequency while those collections with a spacing parameter that is smaller will capture
absorptions that bear a higher frequency, simultaneously capturing information related to
different frequencies.
The feature vector associated to each of the j collections is provided by:
TjKjjDTrans jjEEE ,...,, 21, =L (5.6)
Where Dj is the spacing parameter associated to that collection.
Combining the feature vectors associated to all the existing collections, one obtains the final
feature vector:
λ
Collection D1
Intensity
λ
λ
Collection D2
Colletion DN
Chapter V Extraction of spectral features based on fuzzy sets
Page 115
;...; ,,, 11 NDTransDTransDTransTrans LLLL = (5.7)
This approach gathers information on the different absorptions and behaviours present in the
spectrum in different frequencies, in such a way that the behaviour of each of the ranges of the
spectrum are modelled in a more precise manner. In this manner, one can get more detailed
information on the type of absorption. However, this approach considerably increases the size of
the feature vector if one wants to add information of very high frequency.
Likewise, the definition of a sole collection of fuzzy sets with a separation parameter D,
sufficiently small to collect the desired frequency information, causes the information contained
in collections of smaller frequency (greater D) to be easily inferred by the classifier only using
this collection and making the pyramidal and multi-frequency approach contain redundant
information.
4. Conclusions
This chapter has described the application of the theory of fuzzy sets to make an optimal
modelling of a hyperspectral vector. This becomes an efficient method that reaches the desired
requirements for the set of features that must be described in that spectrum, given the
characteristics of high correlation between adjacent bands, the high dimensionality of the data and
the discriminating power of the absorption bands, the separation of the spectrum to be analysed in
different fuzzy sets and the extraction of its features based on the Energy of each of them.
The use of fuzzy sets for the separation of the spectrum in different ranges takes advantage of the
high correlation between adjacent bands at the same time that it avoids the sharp separation
between ranges that would cause a separation based on the classical definition of sets. The
sensitivity of each of the fuzzy sets that are defined for a spectrum is similar to that obtained by a
cone of the human visual system. This way, the features extracted through this methodology
would correspond to those acquired by a multi-spectral eye with a number of types of cones that
is the same as the number of defined fuzzy sets.
Chapter V Extraction of spectral features based on fuzzy sets
Page 116
The representation of the spectrum based on the Energy of different fuzzy sets allows the efficient
and practical modelling of the parameters that define the absorptions present in that range of the
spectrum, allowing for the creation of a universal feature vector.
This approach reduces the size of the information stored in the spectrum, as it intrinsically stores
those features necessary for the classification. Therefore, the computational cost necessary for its
calculation is much smaller than other methods of extraction/reduction of features. Given its
universality and that fact that it does not have variations in its extraction, it can be implemented in
hardware platforms, thus speeding the extraction process in very high resolution spectra.
Different from what happens with other methods of feature extraction, such as PCA or LDA, the
features extracted by this method do not lose their physical meaning. The concept of Energy that
is associated to each of the sets in which the spectrum is divided into represents the degree of
reflectivity in that set of the spectrum. This is equivalent to the information extracted by the
human visual system.
Likewise, it also does not require of a previous training in order to extract fuzzy sets that are
going to define a material, as can happen with PCA or LDA, in which the addition of a new
element in the classification varies the extracted variables.
The multi-frequency approach could offer promising results. However, the problem is that it adds
redundant information to the extracted features which can needlessly increase the size of the
feature vector.
On the other hand, the Hughes Phenomenon [HUGH_68] effect is reduced because of the
reduction of the feature space in an efficient manner. This is due to the fact that the proposed
method imitates the human eye when trying to extract the existing discriminatory properties in the
different absorption bands and the correlation between adjacent bands. In this manner, the
number of necessary training elements is reduced and the complexity of the classifier is
simplified.
The efficient reduction of features that is achieved by this method reduces the representative size
of a spectrum. This reduction permits the extraction of information on the spatial distribution of
Chapter V Extraction of spectral features based on fuzzy sets
Page 117
different spectra, creating feature vectors that can be tackled in terms of the Hughes Phenomenon
and the complexity of the classifier.
Chapter V Extraction of spectral features based on fuzzy sets
Page 118
Page 119
Chapter VI
Integration of spectral and spatial features
Chapter VI Integration of spectral and spatial features
Page 120
Earlier chapters described different approaches that allow the feature extraction which make
possible the definition of the spectral properties of materials. In this manner, these features can be
used to discriminate different materials. At times, due to diverse causes (cost of spectrometers,
acquisition speed, spectral similarity of the objects to be analysed...), this information may not be
enough in order to obtain a classification with an adequate discriminatory capacity.
However, the fact that two objects can have similar spectral features (colour) does not entail that
one has the same visual perception of them, as one can observe elements of a similar colour, such
as a piece of tree bark and an object of clay, and perceive that they are different. One notices that
the distribution of those colours presents a different pattern in both objects. This spatial
distribution of the different chromatic tonalities of an object is known as texture.
The discriminatory power of the spatial distribution makes it appropriate to include this
information in the model of the material in order to increase the discriminating power when
dealing with different materials.
In order to illustrate the manner in which the inclusion of this spatial information increases the
discriminating power between different classes, figure 6.1 shows the statistical distribution of two
spectral variables that represent two different classes, class 1 and class 2. As can be seen in this
figure, point P represented within it, has a similar membership probability to class 1 or class 2. In
this manner, only taking into account the information contained in the spectral variables of the
point, which point can not be assigned to one class or another with the desired certainty.
Fig. 6.1. Statistical distribution of the elements of two classes with an area of overlapping.
Green
Red
V1
R1
Class 1
P
Class 2
Chapter VI Integration of spectral and spatial features
Page 121
However, through the study of the spectral features of the points situated in the spatial
neighbourhood of point P, one can estimate the statistical distribution caused by its features
associated to the points in that neighbourhood. This way, in figure 6.2, X marks the spectral
features of the points that belong to the P neighbourhood. In figure 6.2, the analysis of this
statistical distribution assigns point P to class 1.
Fig. 6.2. Classification of a set of elements based on its statistical distribution.
Bearing in mind the earlier example, the observations of the statistical distribution of the spectral
features of a set of spatially near pixels allows a more precise characterisation of the class to
which the central pixel belongs. Let us suppose that one has to classify an element associated to
pixel P as belonging to either a light or a dark element. The example in figure 6.3 shows the
impossibility to undertake a precise classification through the use of the value of the features
which define a sole pixel. However, through the analysis of the statistical distribution of near
points, one can infer that the object to which they belong is predominantly dark and make its
more precise characterisation.
Green
Red
V1
R1
Class 1
X
X
X X
X X
P
Class 2
Chapter VI Integration of spectral and spatial features
Page 122
Fig. 6.3. Use of spatial distribution of features for a classification. Above) Use of the single
element of the pixel, Below) Use of spatial features in the classification.
In this chapter, a proposal is made for a new methodology to create a descriptor which includes
the spatial information within a spectral model that defines each element of the image. In order to
do so, the use of neighbourhood histograms that compile the statistical distribution of each of the
extracted spectral features is proposed, integrating this information in a sole feature vector.
The proposed discriminating features not only will increase the separability between different
materials through the integration of spectral-spatial information, but also, due to this increase in
separability, will allow the use of simpler classifiers.
Specifically, the use of fuzzy neighbourhood histograms is suggested in order to model the spatial
properties of the discriminating features of the materials. First, an explanation is provided on the
need to use of the method of fuzzy discretisation for the estimate of the different groups that
make up the histogram. This fuzzy discretisation avoids erratic changes in the created histogram
due to small variations in the intensity of the variables. Second, a theoretical description is
provided on the implementation of the concept of the fuzzy neighbourhood histogram. This
concept is broadened throughout the chapter for its application on spectral and vectorial images.
Analogously, the concept of fuzzy region histogram is created thus replacing the neighbourhood
histogram for the calculation of the histogram based on the concept of a range.
Last, a brief listing is provided of the contributions made in this chapter on the integration of
spectral and spatial features.
Classifier
Classifier
P
P
Chapter VI Integration of spectral and spatial features
Page 123
1. Fuzzy spatial histograms
In order to include spatial information within the model, we propose the use of spatial histograms
that capture the existing variability in a specific spatial region of an image.
By definition, a histogram represents the frequency with which a studied variable is found within
each of the previously defined intervals, providing information on the statistical distribution that
the variable follows in the sample. Chapter IV, section 4.3.1, details the use of the histogram
calculation for the estimate of the probability functions.
First, the variable that is to be represented is quantized in a number of intervals that define the
possible range of values for that variable. For each element of the sample space, a calculation is
made of the interval to which it belongs and the number of elements that it contains is increased.
In this manner, each of the intervals of the histogram stores the number of times that the value of
an element of the sample space has fit within that interval.
This way, the histogram calculated in such manner is a reflection of the probability density
function that the analysed variable follows. Therefore, the variables that define this histogram are,
in turn, a reflection of that probability function. This way, the study of these variables
corresponds with the study of the probability distribution function of the elements in the sample
space that have generated the histogram.
Fig. 6.4 Histograms associated to different types of distributions of the values of variable X. a)
Constant distribution, b) Slightly darkened constant distribution c) Chess distribution.
),( yxX
H
Hx = [0, 0, 0, 15, 0, 0 ] Hx = [0, 0, 15, 0, 0, 0 ] Hx = [7, 0, 0, 8, 0, 0 ]
Chapter VI Integration of spectral and spatial features
Page 124
Figure 6.4 shows the spatial histogram calculated for those images in the higher part. For its
calculation, the gradation of possible intensities has been quantized and in this manner, each
intensity is associated with one of the groups in which the histogram is divided. Once quantized,
the number of elements belonging to each of the groups is calculated, thus obtaining a histogram.
1.1. Improvement in the quantization of the histogram.
Figure 6.4 shows the division of the range of values that a variable X can have in a set of intervals
that are equally spaced in a normal manner. This associates each of the possible values of the
variable to a specific membership group.
In this way, this histogram is mathematically defined by a vector whose number of components is
equal to the number of divisions of the range of values and each component of the vector
containing the number of elements that belong to that range (6.1).
[ ]MX
NNNyx ,...,,),( 21=H (6.1)
Where Ni is the number of elements that belong to the i interval.
However, this way of quantizing groups creates discontinuities in the borders of each of the
intervals. The elements situated in these limits can jump from one group to another, thus making
the shape of the histogram vary due to small variations caused by noise. This is shown in figure
6.4a and 6.4b, in which a slight change in intensity can generate a totally different histogram
descriptor (vector H(x, y)). Additionally, this quantization does not reflect, in a steadfast manner,
the grade of membership of the variable to each of the groups in which the histogram is divided.
Because of this, and in order to overcome these limitations, a proposal is made to study the
discretisation and quantization of the variable to be studied in fuzzy sets. In this manner, each
element of the sample space of variable X contributes to each interval of the histogram in
accordance with its grade of membership. Figure 6.5 shows these two types of discretisation.
Figure 6.5a shows the classical discretisation process where each element of the sample space is
absolutely and exclusively associated with its membership in a sole group, while figure 6.5b
Chapter VI Integration of spectral and spatial features
Page 125
shows the fuzzy discretisation of the variable where a different grade of membership is associated
to each of the groups.
Fig. 6.5. Quantization of variables a) Classical quantization b) Fuzzy quantization.
Using as a basis the previously mentioned approaches, let us consider a set of elements
represented by a feature X, where X is a continuous variable with values in the interval [0, A],
where high values represent a greater intensity and low values represent a lower intensity. By
analogy with the images represented in grey levels, the high values of variable X are represented
with light hues and low values with dark hues.
As previously mentioned, and in order to proceed to the histogram generation, this X feature is
discretised through the division of the range [0, A] into several intervals.
Performing this discretisation through a classical approach, each value of X is exclusively
assigned to a unique group. Mathematically expressed, a rectangular membership function for
each of the discretised intervals can be defined in such a way that this function for the X values
that belong to this set have a unit value and a zero value for those that do not belong, as shown in
figure 6.6.
Fig. 6.6. Histogram membership function )(xHMfi .
Child
Adult
Elder
Age
Child
Adult
Elder
Age
0
1
X
)(xHMf i
Chapter VI Integration of spectral and spatial features
Page 126
This function which defines the membership of variable X within the i group, is provided by the
following rectangular function:
+<<−=
Otherwise,022
,1)(D
XXD
XxHMf ii centralcentral
i (6.2)
Where icentralX is the central value of each of the groups and D the width of each of the groups as
defined by D = A / N, A being the highest considered limit and N the number of groups in which
the variable has been quantized. The function )(xHMfi defined in (6.2), shall be named:
Histogram membership function
An effective way to quantize this useful variable for the calculation of the histogram is through its
representation via a vector Qx of the same number of components as the number of groups in
which the feature (N) has been divided. Each component of this vector shall be defined by the
value of the function )(xHMfi which defines the grade of membership for each of the groups of
the specific value of X.
)](),...,(),([)( 21 xHMfxHMfxHMfx Nx =Q (6.3)
Fig.6.7. Classic quantization of a variable.
Quantization of a variable
1
0
0,5
X
Quantization of a variable
1
0
0,5
X
Qx = [0, 0, 0, 1, 0, 0] Qx = [0, 1, 0, 0, 0, 0]
Chapter VI Integration of spectral and spatial features
Page 127
Figure 6.7 shows two examples of quantization of variables through this method as well as the
associated quantization vector Qx.
However, this function )(xHMfi can be generalised in such a way that the membership to each of
the groups is not defined by an all or nothing of classical discretisation. The definition of the
membership functions of each of the intervals can be done through the use of triangular functions
(6.4), which implement the aforementioned fuzzy discretisation.
+<<−
−−
=
Otherwise,022
,21)(
DXX
DX
D
XX
xHMf ii
i
centralcentral
central
i (6.4)
These triangular membership functions, shown in figure 6.8, associate each value of the feature X
to a specific grade of membership for each of the groups of the histogram. In this manner, the
membership of a certain value of X is not exclusive to a sole group but rather the value of
membership is shared between several groups.
Fig. 6.8. Fuzzy quantization of a variable
This way, by substituting the membership function (6.4) in equation (6.3), one can obtain the
quantization vector associated to fuzzy sets in which the X variable has been quantized. In this
way, a specific value of X shall be quantized by a vector Qx that is established by the values of
the different membership functions in the different sets, as shown in figure 6.8.
Histogram membership functions
Quantization of the variable
X
1
0
0,5
X
Qx = [0, 0, 0, 0.8, 0.2, 0] )(xHMf i
Chapter VI Integration of spectral and spatial features
Page 128
In this manner, through the definition of the vector Qx, one can provide a robust mathematical
representation for the discretisation of the histogram which allows the undertaking of its classical
quantization or its fuzzy quantization through the use of different membership functions. This
quantized vector Qx, in any of its forms, is going to serve as the basis for the mathematical
definition of the histogram.
1.2. Definition of the fuzzy neighbourhood histogram.
Once the variable X is quantized via the quantization vector Qx, one can create a histogram that
characterises the distribution of the different intensities of variable X. The final objective of the
creation of this histogram is the estimate of the probability density function of the spatial
distribution of feature X in the surroundings of that point.
For this, let I be a two-dimensional image containing values of a variable X in each of its pixels,
the value of X is defined by I(x, y) for a point P(x, y) of the image. In the same manner, one can
consider Qx(x, y) the vector that represents the quantization of variable X in each point (x,y) as
defined in equation (6.3).
Figure 6.9 shows an example of an image I(x, y) as well as its different quantizations, using the
triangular membership function described in equation (6.4).
Fig. 6.9. Example of the quantization of different points of image I(x, y).
),( yxxQ
),( yxxQ
),( yxxQ
),( yxxQ
x
Chapter VI Integration of spectral and spatial features
Page 129
In order to include in the model the statistical information of the spatial distribution of variable X
in a neighbourhood of the point P(x,y), we define an associated neighbourhood for each point
P(x,y) to be analysed and we calculate the histogram defined by variable X in that
neighbourhood.
In order to do so, we define a neighbourhood centred on P(x,y) of previously established
dimensions [Width, Height]. The use of large-sized neighbourhoods produces a better estimate of
the spatial distribution of the variable. On the other hand, they require a greater calculation time
at the same time that they increase the possibility that neighbourhood has more than one class of
material.
The representation employed for the quantization vector Qx uses a direct calculation of the
histogram of the neighbourhood associated to a point through the sum of each of the vectors Qx
associated to each of the points (i,j) that belong to the neighbourhood P(x,y), as defined in
function (6.5).
),(),( jiyxX
By
Byj
Ax
AxiX
QH ∑∑+
−=
+
−== (6.5)
Where A and B determine the size of the neighbourhood to be analysed.
Please note that in the case of a classical discretisation (rectangular membership function
(equation. (6.2))), the histogram vector obtained through equation (6.3) corresponds with the
vector that would have been obtained through the sum of the number of elements belonging to
each of the intervals that make up the histogram.
Figure 6.10 graphically shows the calculation of the function HX(x, y) for a neighbourhood of size
3 x 3. One can notice that the calculation of the function of the histogram HX(x,y) is done in a
direct manner via the sum of the quantization vectors. This vector HX(x, y) represents the level of
presence of the different intensities that variable X can have in the neighbourhood of point P(x,y).
Chapter VI Integration of spectral and spatial features
Page 130
Fig. 6.10. Graphical representation of the calculation of the fuzzy neighbourhood histogram
In order to really obtain an estimation of the probability function of the intensities of X in the
neighbourhood, the sum of each of the elements of the vector HX(x,y) that define the histogram
must equal the unit. The normalised neighbourhood histogram is obtained by dividing this vector
by its associated L1 norm, as described in (6.6).
1),(
),(),(ˆ
yx
yxyx
X
X
X H
HH = (6.6)
Where
∑∀
=i
XX iyxyx ))(,(),(1
HH (6.7)
),(ˆ yxX
H represents the estimate of the probability density function of the behaviour of the
variable X in the described neighbourhood, which captures the spatial behaviour of this variable
in its neighbourhood.
1.3. Definition of the fuzzy region histogram
The earlier definition estimates the probability density function of a variable in a neighbourhood
of a point. However, this concept can be generalised to any region of the image. This fact
generates a histogram which captures the spatial information of the variable in a specific region
of the image.
),( yxxQy
x
),( yxX
H
Chapter VI Integration of spectral and spatial features
Page 131
An R region of the image is defined as a set of points belonging to an image, which are normally
connected and which represent a specific area of it. An example of a region is shown in figure
6.11.
Fig. 6.11. Representation of a region of an image.
The same as for a neighbourhood, one can calculate the histogram of variable X for a region R
from a quantized vector Qx through the sum of the quantization vector of each of the points
belonging to that region:
∑ℜ∈∆
=ℜ),(
),()(ji
XiXjiQH (6.8)
Analogously to the earlier case, one can proceed to its normalisation:
1)(
)()(ˆ
iX
iX
iX ℜ
ℜ=ℜH
HH (6.9)
Both the definition of the neighbourhood histogram (6.6) as well as the definition of the
histogram of a region (6.9) represents the estimated probability function of the values that the
variable X can have in the area of the associated image. This histogram vector (6.6,6.9) directly
shows the variability of variable X in the associated region or area of the image, so therefore, it
can be directly used as a feature vector that includes the spatial variability of X.
y
Chapter VI Integration of spectral and spatial features
Page 132
In this manner, the study of the shape of the HX vector, as shown in figure 6.12, uses spatial
information included in it in order to achieve a more precise discrimination of the elements to be
classified. Figure 6.12 additionally shows that the four types of distributions can be clearly
differentiated through the use of spatial features. However, it is not possible to classify them by
only using the value of the central point, but rather require a spatial study in order to obtain an
optimal discrimination.
Fig. 6.12. Histograms associated to different types of distributions of the values of variable X. a)
Constantly light distribution, b) Constantly dark distribution, c) Chess distribution, d) Variable
distribution
However, the use of the definition of the histogram vector ),(ˆ yxX
H as a feature vector includes
the spatial distribution of the elements and uses spatial variability in the discrimination of those
elements.
2. Extension of the fuzzy spatial histograms to vectorial features.
Spectral-spatial histograms
The previous section provided definitions of spatial histograms that are capable of including
spatial information of the neighbourhood and regions contained in the images defined by a single
scalar variable. In this section, the earlier model of spatial histogram is extended to vectorial
images, that is, those defined by a feature vector as the case of spectral images.
),( yxX
H
Chapter VI Integration of spectral and spatial features
Page 133
The method proposed in the following is valid for any type of multivariate image, in which every
point is defined by a vector irrespective of its origin. In this manner, one can directly apply it to
spectral images, in which the feature vector defines its texture, or any type of combination or
representation of data or features that could be represented in a vectorial manner, such as features
extracted from hyperspectral images that were mentioned in earlier chapters.
Analogous to the previous section, a two-dimensional image is that which is represented in every
point P(x,y) by a feature vector ),( yxI having each element of vector I the value of a feature Xj
associated to that point P(x,y).
Fig. 6.13. Representation of a vectorial image by the vectorial function I(x, y).
In this section, the two types of histograms which were proposed earlier (neighbourhood and
region) are going to be re-defined in order to apply them to vectorial images. In this manner, one
can achieve the integration of the spectral information contained in the image with the spatial
variability in the associated neighbourhood or region.
For this, we propose to extend the method of discretisation of the histogram to vectorial variables,
as well as the generation, from this vector, of a spectral histogram vector which simultaneously
contains information that is spectral as well as spatial.
),(
),(
...
...
),(
),(
2
1
yx
yxX
yxX
yxX
M
I=
Chapter VI Integration of spectral and spatial features
Page 134
2.1. Quantization of the feature vector.
This section details the quantization of each of the components jX of the vector ),( yxI . This
quantization is done through the separate quantization of each of the scalar components jX of
the vector ),( yxI .
Analogous to the scalar case, in which a quantization vector was obtained XQ (equation 6.3) for
the studied feature X, in the vectorial case, a quantization vector is obtained ),( yxjXQ for each
of the features. Each vector ),( yxjXQ has N components which correspond to each of the groups
in which the histogram has been discretised for that variable jX , obtaining a vector
),( yxjXQ for each of the M components jX of the vector ),( yxI . The calculation of this
vector ),( yxjXQ for each of the components is shown in equation 6.10.
)](),...,(),([),( 21 jNjjX XHMfXHMfXHMfyxj
=Q (6.10)
Taking each of the quantization vectors ),( yxjXQ for each of the components, one can create an
aggregate vector of quantization ),( yxQ that contains the quantizations of all the components of
),( yxI in a sole vector:
)],(),...,,(),,([),(21
yxyxyxyxMXXX QQQQ = (6.11)
Figure 6.14 illustrates the generation of this aggregate vector of quantization from the
quantization of each of the components of ),( yxI . One can notice that the dimensions of this
vector are [M · N, 1], where M is the number of components of the vector ),( yxI , and N the
number of groups in which the histogram has been discretised.
Chapter VI Integration of spectral and spatial features
Page 135
Fig. 6.14. Vector Quantization ),( yxI
The generation of the aggregate vector Q(x, y) for the quantization of vector ),( yxI is going to
define a simple and robust calculation of the spatial histograms associated to vectorial images, as
shall be seen in the following section.
2.2. Definition of fuzzy vectorial histograms.
Without taking into account the quantization vector described in an earlier section, let us suppose
that one proceeds to the quantization of each of the M features jX in a separate manner, thereby
separately obtaining the histogram associated to each of them for a specific neighbourhood or
region.
Each of these histograms would be defined by the following equation for neighbourhood
histogram cases:
),(),( yxYXjX
BY
BYy
AX
AXxjX
QH ∑∑+
−=
+
−== (6.12)
And the following when dealing with region histograms:
),(
),(
...
...
),(
),(
2
1
yx
yx
yx
yx
MX
X
X
Q
Q
Q
Q
=
),( yxI
Chapter VI Integration of spectral and spatial features
Page 136
∑ℜ∈∆
=ℜ),(
),()(yx
jXijXyxQH (6.13)
The normalised histogram for each of the different types of histograms would be defined as:
1
ˆ
jX
jX
jX H
H
H = (6.14)
In this manner, each histogram represents the spatial variability of each of the components of the
vector ),( yxI for that region or neighbourhood.
In a vectorial case, there are M histograms associated to each of the components of the vector
),( yxI . Each of the vectors jX
H represents the spatial variability of the variable which they
represent. These vectors can be grouped in a sole aggregate histogram vector ),(ˆ yxH which
represents the spatial variability of the components of the vector ),( yxI in a joint manner:
)],(ˆ),...,,(ˆ),,(ˆ[ 21),(ˆ yxyxyx Myx HHHH = (6.15)
Figure 6.15 graphically shows the construction of this aggregate histogram vector ),(ˆ yxH . Let us
mention that it includes within it the information on the spatial variability of each of the variables
jX in a simultaneous manner and that it characterises in a sole vector the spectral and spatial
properties of the image in a specific neighbourhood or region.
Chapter VI Integration of spectral and spatial features
Page 137
Fig. 6.15. Graphical representation of the aggregate histogram vector ),(ˆ yxH
Going back to the definition of the aggregate quantization vector ),( yxQ , it is possible to obtain
the aggregate histogram vector ),( yxH by directly using the vector ),( yxQ . This way,
substituting ),( yxjx
Q for its aggregate equivalent, one achieves in a simple and elegant manner
the definition and calculation of the aggregate histogram vector ),( yxH which speeds up its
calculation while reducing its computational cost.
Through the use of ),( yxQ , the calculation of the fuzzy histogram vector would be:
),(),( jiyx
By
Byi
Ax
Axi
QH ∑∑+
−=
+
−== (6.16)
Where A and B establish the neighbourhood limits.
In the case of an aggregate histogram for a region, one obtains:
…
X1
Quantization
…
X2
XM
Quantization
Quantization
),(ˆ yxH
),(ˆ yxMH
),(ˆ2 yxH
),(ˆ1 yxH
Chapter VI Integration of spectral and spatial features
Page 138
∑ℜ∈∆
=ℜ),(
),()(ji
ijiQH (6.17)
These aggregate histogram vectors defined in equations (6.16) and (6.17) compile the spatial
variability of the vectorial features (spectral) of the image as shown in figure 6.15. This way the
discriminating capacities of spectral and spatial information that characterise the associated
neighbourhood or region can be elegantly combined in a sole vector.
3. Conclusions
This chapter has introduced the concept of fuzzy neighbourhood histograms and fuzzy region
histograms, which have the property of being able to represent the spatial behaviour of a certain
variable.
This definition has been extended to any type of vectorial image. In this manner, one can extract
an aggregate histogram vector that joins, in a sole feature vector, the spatial and spectral
properties of a certain region of a vectorial image.
The fuzzy quantization of spatial histograms allows for its better representation, avoiding sharp
changes in the shape of the histogram due to noise, the inherent variability of the acquisition and
to values near the border points.
The combination of spectral and spatial features in a sole vector allows for a better separability of
overlapping clases. One can achieve greater information on the statistical distribution of the
discriminating variables through the study of spatial and spectral properties of the neighbourhood
or region.
Using this approach, each point of the vectorial image is defined not only by its spectral features,
but also by the spectral-spatial histogram defined in the neighbourhood of the point, thus
increasing the information describing that point.
Chapter VI Integration of spectral and spatial features
Page 139
The inclusion of spatial features not only increases the separability in classes that have
uncertainty areas, but also models objects composed of different materials, as the presence of
diverse materials shall be detailed in the histogram without affecting it.
Another characteristic of these spectral histograms is that they are composed of the sum of
random variables represented by each of the components that indicate either membership or not to
a specific group present in the vector ),( yxQ . Applying the central limit theorem [BISH_06] for
large regions, the sum of these random components will tend towards a Gaussian behaviour. For
this reason, the use of large neighbourhoods in the calculation of the histogram is going to allow a
tendency of each of the variables of the histogram towards a Gaussian behaviour. This can
simplify the use of classifiers based on Gaussian or Gaussian mixtures for classification purposes.
Chapter VI Integration of spectral and spatial features
Page 140
Page 141
Chapter VII
Classification of spectral images and region analysis
Chapter VII Classification of spectral images and region analysis
Page 142
Earlier chapters have shown different techniques that allow the feature extraction in order to
define and model the properties, both spectral as well as spatial, of each of the elements of a
hyperspectral image within a sole feature vector. However, these methods have been described in
an independent manner, without going into much detail over the integration within the complete
process for the classification of hyperspectral images.
This chapter describes the proposed methodology for the classification of hyperspectral images
from a theoretical perspective encompassing the entire process, from the initial acquisition of the
image to the proposed methods of re-classification.
In order to do so, this methodology is described in a global manner, which clarifies the proposed
process of classification. Subsequently, further detail shall be provided on the different modules
which shape the process. The different possible alternatives for its implementation are introduced
in the description of each of the modules.
1. General description of the process of classification
This section briefly lists the proposed methodology for the extraction of feature vectors and their
use for the classification of materials in hyperspectral images.
The proposed method is mainly divided into two stages. In the first, an extraction of the content
of the background of the image is made. This segments those elements which are of interest, in
order to subject them next to a normalisation process and decorrelation of the spectrum that
extract a feature vector capable of representing the spectral-spatial features of each of the image
points and that classifies it based on a statistical classifier.
The different modules contained in this first phase are the following:
1. Image capture.
2. Background extraction and segmentation.
3. Normalisation of the lighting conditions.
4. Spectral decorrelation.
5. Integration of the spectral-spatial features.
6. Preliminary classification of the image.
Chapter VII Classification of spectral images and region analysis
Page 143
Figure 7.1 is a graphical description of the procedure employed. First, a hyperspectral image is
obtained on which to proceed with the segmentation of the background. Subsequently, one
proceeds to the normalisation of the illumination of each of the spectral vectors that are
associated with each pixel in order to avoid its variations due to lighting. Once normalised, the
discriminant spectral features of the spectrum are extracted through a process of decorrelation
(Chapter IV and V). This adds spatial information to them through the use of neighbourhood
histograms (Chapter VI). Finally, the obtained feature vector is assigned to its corresponding
class through the use of statistical classifiers (Chapter III ).
Fig. 7.1 Detailed process of the first classification phase.
Background extraction
Filtering of the spectrum and lighting correction
Decorrelation
Fuzzy sets
Normalisation of the spectrum
PCA and other methods
Extraction of the spectral feature vector
transfomedL
Acquisition of the spectral image
Classification
Integration of spectral-spatial neighbourhood features
Neighbourhood histogram
Quantization vector
Extraction of the feature vector
),(ˆ yxH
Chapter VII Classification of spectral images and region analysis
Page 144
As seen in figure 7.1, the final result of this phase is an image that labels each of its pixels within
any of the modelled classes. However, due to the great dispersion of the materials to be classified,
some of these regions shall not be correctly classified, so therefore we propose to have a second
phase that corrects those erroneously classified regions.
Bearing in mind that the erroneously classified regions usually are connected with other correctly
classified regions, in this second phase we proceed to the correction of the earlier classification
results through the use of region histograms (Chapter VI) that combine spectral and spatial
features of each of the classification regions that were obtained in the previous phase. This way,
based on these histograms, one can check the contiguous regions in order to estimate their
possible merging and reclassification, as stated as follows:
7. Merging of regions.
8. Reclassification of regions.
Figure 7.2 graphically shows the proposed procedure for this second phase:
Chapter VII Classification of spectral images and region analysis
Page 145
Fig. 7.2 Region histogram and reclassification of regions.
Through the use of this second phase, we achieve the re-classification of those elements that,
although having an acceptable similarity with the adjacent regions, were incorrectly classified in
the previous phase.
2. Classification of hyperspectral images
This section provides details on the first part of the previously described process which includes
the image acquisition, the correction of lighting, the spectral decorrelation and the inclusion of
spatial features in the feature vector. Subsequently, a classification of each image pixel is done
based on the feature vector that is extracted through the use of a statistical classifier.
Extraction of region features
Region histogram
Verification of the merging of regions
Reclassification
H
HHmerged
Chapter VII Classification of spectral images and region analysis
Page 146
The following sections provide step by step details of each of the modules of this phase, as shown
in figure. 7.1.
2.1. Acquisition of an image and the correction of lighting.
As mentioned in Chapter II, the acquired spectral image is represented by a three-dimensional
matrix. The first two dimensions of the matrix represent the (x,y) position of each of the points in
the image. The third dimension represents each of the wavelengths reflected by each pixel, as
shown in figure 7.3.
Fig. 7.3 Representation of the hyperspectral image
In this manner, each point (x,y) of the image L is represented by a vector L(x,y), whose
components correspond with each of the K responses in intensity of the wavelengths in which the
spectrum is discretised, that is, the quantity of light reflected in that (x,y) pixel based on its
wavelength. This way, each point in the image is represented by the vector L(x, y), that is
associated with the spectral response to that point, as defined in equation (7.1).
TKLLL ,...,, 21=L (7.1)
However, the appearance of this spectral vector L depends on several factors: the spectrum of the
incident light, the composition of the material, the external geometry of the material, the
reflections and interactions of the lighting with the numerous elements and other factors. Chapter
II described the process of the generation of the reflected spectrum which, in the event of diffuse
lighting conditions, would remain in the following manner for every type of material:
( ) )(·)·()( λλλ lightLCmL Ω= (7.2)
x
λ
y
Chapter VII Classification of spectral images and region analysis
Page 147
Where )(Ωm is a geometric coefficient that defines the percentage of the spectrum of incident
light that is observed by the sensor and that depends on the relative position between the
illumination, the camera and the 3 D geometry of the object, as well as on the interactions due to
its possible rugosity, as defined by Ω .
( )λC indicates the reflectivity or the chromaticity of the material and represents the percentage
of reflected light by the material based on its wavelength. )(λlightL represents the vector of
incident lighting.
Therefore, one can observe that, of the three components that define vector L, only the value of
the reflectivity or chromacity of the material ( )λC is useful for its characterisation, given that the
other factors do not depend on the composition of the object, but rather on other factors. This
way, the incident spectrum of lighting depends on the illumination used and the geometric
coefficient )(Ωm depends on the geometry between the different elements, without taking into
account the composition of the material.
Therefore, it is necessary to transform vector L in order to make it independent, both from the
geometric variables that are modelled by the vector )(Ωm , as well as from the source of lighting
used and defined by the incident spectrum of lighting )(λlightL .
2.2. Independence from the lighting source
Figure 7.4 shows the different types of lighting spectra )(λlightL that are appropriate for their use
in hyperspectral applications. Figure 7.4a shows continuous lighting, which is that most
commonly used. This lighting has an adequate emittance for all wavelengths that are going to be
used to characterise the different materials. The second type of lighting (Fig. 7.4b), corresponds
to a white lighting, which has the particularity of having a similar emissivity in all the spectrum
range, providing a sensation of white hue to the human eye. The third type of lighting (Fig. 7.4c)
shows a light with the same emissivity value for each of the spectrum regions. This lighting is
only used in applications of great precision, due to the difficulty in obtaining a source of lighting
that has these features of equal emissivity in a wide range of wavelengths.
Chapter VII Classification of spectral images and region analysis
Page 148
Fig. 7.4 Representation of different types of lighting emission spectra. a) Continuous spectral
lighting, b) White spectral lighting, c) Ideal spectral lighting
Changes made in the incident spectra of lighting cause, as can be seen in (7.2), a univocal change
in the observed spectrum L, making its characterisation impossible or difficult when facing
different or variable incident lighting spectra.
In order to achieve the invariance of vector L when facing an incident spectrum of lighting
)(λlightL , one begins with the methodology proposed by Tan et al [TAN_04]. The aim of this
method is to obtain the reflectance spectrum that would have been obtained if an ideal incident
white lighting would have been used with an emittance of an identical value in each of its
wavelengths (Fig. 7.4c), thus making vector L independent of the incident spectrum of light.
In order to achieve this correction, one first calculates the chromacity of the incident light (7.3):
∑=
=N
n
light
light
light
nL
LC
1
)(
)()(
λλ
(7.3)
Where N is the number of components of the spectrum.
λ
Emittance LLight
λ
LB
λ
LB ideal
Chapter VII Classification of spectral images and region analysis
Page 149
This chromaticity value represents the percentage of intensity of each of the wavelengths in the
spectrum of light. This way, it expresses the percentage of intensity emitted by each wavelength
in relation with the total emitted intensity and not with the absolute intensity of the lighting
spectrum.
Dividing vector L by the chromaticity of the obtained light, the result is a reflectance spectrum
that is independent of the spectral appearance of the source of lighting used, thus being equivalent
to the spectrum that is obtained under ideal white lighting conditions.
( ) ( )∑=
Ω=Ω
==N
n
light
light
light
light
nLCmC
LCm
C
LL
1
)(·)·()(
)(·)·(
)(
)()(ˆ λ
λ
λλ
λλ
λ (7.4)
∑=
=N
n
lightlight nL1
)(L (7.5)
The module of the lighting spectrum (7.5) does not depend on the wavelength and represents the
luminous intensity of the light source. Due to the fact that this intensity does not depend on the
luminous characteristics of the object, one can integrate it in the geometric coefficient )(Ωm
(where Ω is all those variables that influence that coefficient) which indicates the quantity of
light that comes into contact with that specific point of the material.
By integrating this module into the coefficient, the following definition for the reflectivity vector
is obtained:
( )λλ CmL )·()(ˆ 2 Ω= (7.6)
In this manner, the vector L so obtained does not depend on the appearance of the spectrum of
incident lighting, but only on a scalar geometric parameter, )(2 Ωm ,which defines the intensity
with which the incident lighting in that point of the image is observed by the sensor, and on the
chromaticity of the material itself ( )λC .
Chapter VII Classification of spectral images and region analysis
Page 150
Given that it is usually not possible to directly obtain the chromaticity of lighting lightC , this can
be estimated through the reflectance spectrum obtained from a white body of reference, as shown
in figure 7.5.
Fig. 7.5 Representation of the hyperspectral vector L and the white body of reference
( WhiteBodyL ).
In this manner, the chromaticity of incident lighting can be estimated through the use of the
spectral vector that is emitted by a white body (7.7), as shown in equation (7.8).
TKWhiteBodyWhiteBodyWhiteBody ,...,LLL
21,=WhiteBodyL (7.7)
( ) ( )λλ
λ lightN
n
WhiteBody
WhiteBody
WhiteBody C
nL
LC ≈=
∑=1
)(
)( (7.8)
The equation for the calculation of the spectrum of reflection invariable to a source of lighting
(7.9) is obtained by replacing the chromacity of the light in (7.4) by its estimate based on the
chromacity of a white body of reference:
)(
)()(ˆ
λλ
λWhiteBodyL
LL = (7.9)
2.2.1. Independence from the geometric coefficient
The spectrum calculated in (7.9) is presented as invariant to the type of lighting employed. It is
only dependent on two factors, as shown in (7.6): the spectrum of the chromacity of the material
λ
Intensity
L
WhiteBodyL
Chapter VII Classification of spectral images and region analysis
Page 151
( )λC and a scalar coefficient )(2 Ωm that depends on the geometric parameters of the point that
indicates the intensity with which the sensor perceives that reflection.
In order to isolate the chromacity vector from the geometric variables and shines that define the
material, diffusivity conditions in lighting are assumed which allow for the application of the
model defined in (7.2). Given that this diffusivity in lighting is not perfect and that objects do not
have to have a Lambertian or dichromatic behaviour, three different methods are proposed which
reduce the influence of the geometric factor )(2 Ωm in the estimation of the chromacity of the
material.
2.2.1.1. Normalisation of the spectrum
The simplest method to reduce these phenomena is based on the normalisation of the spectrum
via the division by its L1 norm, thus reducing by a grade the dimensionality of the vector L .
Despite this loss of information, the normalised invariant spectrum that is obtained in (7.10) is
invariable to geometric changes, being solely dependent of the spectrum of chromaticity of the
material, as shown in (7.11).
∑=
=K
n
Norm
nL
LL
1
)(ˆ
)(ˆ)(ˆ λ
λ
(7.10)
( )
( )
( )
( )∑∑==
=Ω
Ω=
K
n
K
n
Norm
nC
C
nCm
CmL
112
2
)·(
)·()(ˆ λλ
λ
(7.11)
2.2.1.2. Montoliu's Invariant
However, the earlier method assumes conditions of the diffusivity of lighting that do not happen
in fact. For this, a proposal is made to use the invariant which was proposed by Montoliu et al
[MONT_05] for Shafer's dichromatic model [SHAF_84].
Chapter VII Classification of spectral images and region analysis
Page 152
Montoliu's method is based on the assumption, as explained in the section on lighting of Chapter
II, that the subtraction and later division between two bands of the spectrum causes invariance to
shines, intensity in the lighting and geometry of the object under Shafer's dichromatic model.
In this manner, we subtract from each of the components of the spectrum )(ˆ λL the minimum
value of this component, then normalising the resulting vector.
( )( )( )∑
=
−
−=
K
n
Montoliu
LLLL
LLLLL
1
)(ˆ),...,2(ˆ),1(ˆmin)(ˆ
)(ˆ),...,2(ˆ),1(ˆmin)(ˆ)(ˆ
λλ
λλλ
(7.12)
2.2.1.3. Stockman´s Invariant
Another technique that eliminates the influence of the geometric factor is that proposed by
Stockman [STOCK_99]. He defined the spectral hue as a scalar invariant in order to define the
hue of a spectrum that is similar to the hue obtained in the RGB image. However, this hue only
defines each spectrum with a single value, excessively decreasing the discriminating power of
that feature.
For the calculation of this hyperspectral hue, Stockman creates what he calls a desaturated
spectrum. This spectrum is calculated in two steps. First, it is made independent of the lighting
through the normalised proposed in (7.10).
In a second step, this spectrum is desaturated through the subtraction of the minimum value
contained in that spectrum, as shown in (7.13).
( ))(ˆ),...,2(ˆ),1(ˆmin)(ˆ)(ˆ KLLLLL NormNormNormNormStockman −= λλ (7.13)
Although Stockman does not use this spectrum as discriminating, but rather only uses it for
obtaining the spectral hue, we propose to use this method in order to achieve the invariance of the
spectrum when facing geometric changes.
All the previously proposed methods obtain a vector which has invariance to the geometry of the
object, always when those ideal conditions defined in equation (7.6) are met. However, given that
Chapter VII Classification of spectral images and region analysis
Page 153
these ideal conditions do not happen in fact, a proposal is made for the use of these three methods
in order to achieve the invariance of the extracted spectrum to the geometric and lighting
variables. Chapter VIII will analyse the effectiveness of each of these methods, which will select
that method which achieves this invariability in a more efficient manner.
2.3. Decorrelation of the luminous spectrum
As mentioned in earlier chapters, due to both the high dimensionality of the luminous spectrum
and the great redundancy of the existing data, it is necessary to reduce its dimensionality. The
objectives for this reduction in dimensionality are two-fold:
- Reduce the amount of redundant information present in the spectrum.
- Transform the data in order to produce a representation where the separation of the
material is maximum.
In order to achieve this reduction of features, we propose several methods for feature extraction
set out in Chapter IV, as well as the method based on the division of the spectrum in fuzzy sets
proposed in Chapter V.
First, we propose the use of the raw spectrum (RAW) and the wavelengths which correspond to
colours red, green and blue of the spectrum (RGB) in order to verify the rate of classification that
would be obtained through the use of a normal colour image or the complete spectrum. In this
manner, one manages to evaluate the efficiency of the proposed algorithms.
The use of the Principal Component Analysis (PCA) or of Fisher's Discriminant (LDA) extracts
highly discriminant features, thus allowing for an adequate compression of the data and
effectively reducing the Hughes phenomenon for the classification. On the other hand, the use of
these methods implies a previous training and furthermore, the obtained features are dependent on
this training set.
As explained in Chapter V, when classes have variations throughout time or when new classes are
added to the system, the discrimination of the obtained discriminant features will vary, thus
making their re-training necessary. This causes the features based on PCA or LDA to not be
optimal where the classes have variations over time or when new classes are added to the system.
Chapter VII Classification of spectral images and region analysis
Page 154
Chapter V showed the necessary conditions for solving these limitations. In this manner, the
extracted feature vector should comply with the following conditions:
− Reduction of dimensionality.
− Adequate discriminatory power.
− Independence from training and from the variability of classes.
− Discriminating power based on the underlying physical properties.
− Generic features that do not depend on the application or the class.
− Maintaining the physical meaning of the features.
Because of all this, we also propose the use of the method based on spectral fuzzy sets that is
described in Chapter V. This method is designed to optimise the discriminating power of the
different absorption bands of the spectrum and complies with the previously stated conditions.
The following sections define the different proposed decorrelation methods in order to reduce the
dimensionality of the spectrum and to increase its separability.
2.3.1. RGB.
In order to be able to compare the goodness of hyperspectral techniques versus the classical
processings based in colour, a spectral vector of three components is created, each associated to
the wavelength of the components of the colours red (650 nm), green (510 nm) and blue (475
nm).
Using this decorrelation, one can estimate the amount of additional discriminating information
that is offered by the proposed methods versus the same methods based on colour.
Chapter VII Classification of spectral images and region analysis
Page 155
Fig. 7.6 Reduction of features based on RGB
In this manner, the feature vector X, which defines the luminous spectrum, shall be provided by:
,,)(ˆ),(ˆ),(ˆ BGRLLL BLUEGREENREDRGB == λλλX (7.14)
Where )(ˆ λL is the normalised luminance vector, as described in equation (7.9).
2.3.2. RAW.
The second proposed decorrelation method does not make any type of conversion to the spectrum
other than the corrections in lighting and normalisation that were previously defined. In this
manner, its high dimensionality or redundancy is not reduced.
The use of this raw spectrum can be used as a comparative pattern when facing the effects on the
classification produced by the reduction of features caused by the following methods.
The X vector of features corresponds with the normalised luminance vector, as defined in the
following equation:
LX ˆ=RAW (7.14)
λ
λ
Intensity
λ=475nm λ=650nm λ=510nm
Chapter VII Classification of spectral images and region analysis
Page 156
Where L is the luminous spectrum under some of the previously defined normalisations
(equation 7.9).
2.3.3. Principal Component Analysis (PCA).
The first proposed method of reduction of features is based on the widely known transform of
Karhunen-Loève [HOTT_33]. The Principal Component Analysis, thoroughly detailed in
Chapter IV, obtains a projection of the data on a subspace of smaller dimension where the
variance of the projected data is maximised.
In order to calculate that subspace, one can select, as training, a set of spectra Lwhich cover the
set of classes to be modelled. Through the use of these training elements, one can calculate the
eigenvalues and eigenvectors of the associated covariance matrix which define the vectors that
establish the subspace of projection.
The first M<K eigenvectors of the calculated covariance matrix are selected, which are associated
to the larger M eigenvectors, which construct the transformation matrix V as detailed in Chapter
IV.
[ ] )ˆˆ·(ˆ,...ˆ,ˆ,ˆ
)ˆˆ(
T321 LLuuuu
LLVX
−
=−=
M
t
PCA (7.15)
The transformed vector X which represents the spectrum is calculated through equation (7.15),
where L is the mean vector of all the spectra L used for the calculation of the transformation
matrix. For additional information on the application of this methodology, see Chapter IV.
Once the transformation is done, the luminous spectrum is represented by vector X in the
subspace defined by the previously calculated eigenvectors. This representation reduces the
dimensionality of the spectrum, thus reducing the Hughes Phenomenon, while at the same time
decorrelating and compressing the information contained in it.
Chapter VII Classification of spectral images and region analysis
Page 157
2.3.4. Fisher's linear discriminant.
Another proposed method of decorrelation is that based on the linear discriminant analysis
proposed by Fisher. This technique, also thoroughly detailed in Chapter IV, does not transform
the system of coordinates by maximising the variance of the training points, but rather optimises
the separability between classes that make up the sample, thus selecting a non-orthogonal set of
axes that maximises this separability. The number of these axes depends on the number of classes
present in the classification and is equal to the number of existing classes minus one.
This method, in the case of Gaussian classes, obtains those axes whose separability corresponds
with the optimal separability that would have been obtained through the use of the Bayes optimal
classifier.
Vector X, which represents the spectrum, is described in detail in (7.16), w being the function of
the Fisher transformation, obtained as defined in Chapter IV.
)ˆ(LwXt
FISHER = (7.16)
2.3.5. Spectral fuzzy sets.
The last proposed method to decorrelate a spectrum takes advantage of the correlation between
adjacent bands in order to characterise the absorption bands that differentiate different materials.
Chapter V gave a thorough explanation of the fuzzyfication of the spectrum. The use of this
method involves the division of the spectrum into fuzzy sets that represent each of the ranges of
the spectrum and the measurement of the Energy of each of the fuzzy sets. The energy associated
to each of these fuzzy sets generates a feature vector that describes the electromagnetic spectrum
in a similar manner to that used by the human eye to characterise red, green and blue colours.
This way, the characterisation of the spectrum has a behaviour that is similar to that of a sensitive
eye in a wide range of the spectrum (hyperspectral eye).
Chapter VII Classification of spectral images and region analysis
Page 158
As detailed in Chapter V, the Energy associated with each of the fuzzy sets represents the
intensity of the spectrum for that set and can be defined as the discrete convolution of the
function of fuzzy membership located in the central point that defines the set.
∫=
=
=K
normii dLMf
λ
λ
λλλ0
)·()·(E (7.17)
The characteristic vector of each spectrum is determined by the vector composed by the Energies
associated with each of the fuzzy sets (7.18).
TMFUZZYSETS EEE ,...,, 21=X (7.18)
This description of the spectrum efficiently characterises the spectral absorptions in an efficient
manner without requiring a previous training. Additionally, the dimensionality of the spectrum is
efficiently reduced while at the same time keeping its discriminating features.
2.4. Integration of spectral-spatial features.
The feature vector X that defines the spectrum efficiently characterises each of the hyperspectral
pixels. However, this approach does not include spatial information of the neighbouring elements.
The use of spatial information provides additional information in such a manner that its inclusion
in the model will allow a better separability between the different classes.
As explained in Chapter VI, a proposal is made for the use of fuzzy neighbourhood histograms in
order to efficiently integrate those spectral and spatial features of the spectrum, taking into
account the variability of the spectral features in a given neighbourhood or region (spatial
variation).
In order to construct this spatial histogram, it is necessary to define a neighbourhood around each
pixel. Once defined, one can calculate M independent histograms for each of the M components
of vector X. In order to create the spectral-spatial vector, these M histograms are concatenated as
follows:
Chapter VII Classification of spectral images and region analysis
Page 159
[ ]),(),...,(),,(),( 21 yxyxyxyx MHHHH = (7.19)
The histogram shown in equation 7.19 codifies the spatial distribution of all hyperspectral vectors
in the selected neighbourhood. Figure 7.7 shows the calculation of the histogram vector from the
vectorial representation of the image in a pre-established neighbourhood. For more detail on the
calculation of histograms, see Chapter VI.
Figure 7.7 Construction of the spectral-spatial vector H(x,y)
2.5. Classification Procedure.
Earlier sections have described the procedure that is used to obtain the feature vectors that merge
the spectral and spatial characteristics. The aggregate histogram defined in (7.19) defines the
spatial distribution of the spectral information in a pre-established neighbourhood surrounding
pixel (X,Y) in a sole vector.
This vector H is used as the input vector for classification in case where the aim is to integrate the
spectral-spatial information. In cases where the spectral information is enough, any of the
Chapter VII Classification of spectral images and region analysis
Page 160
implementations of the vector X that are defined in section 4.2 as an input vector shall be
included.
In order to evaluate and compare the goodness of the obtained feature vector, the use of a
classifier based on the multivariate Gaussian distributions is proposed. The use of this classifier,
instead of using the more complex classifiers described in Chapter III, lies in the adequate
interpretability as well as in the good generalisation obtained with this type of classifiers. Its
simplicity evaluates the goodness of different features used for the characterisation of different
materials.
In this manner, Ci are each of the classes to be classified, which are defined by a set of Ni training
vectors H(x,y) (or X(x, y) if one does not include spatial features) belonging to this class. From
this set of vectors, a Gaussian model is created for each of the classes from the associated training
vectors, as defined in the equations (7.20) for the calculation of the mean vector associated to
each of the classes:
)(1∑=
=i
i
N
n
nC Hµ (7.20)
And the calculation of the covariance matrix of maximum likelihood for those training vectors
that belong to each class (7.21)
( )TCn
N
n
CnC i
i
ii NµHµHΣ −−
−= ∑
=
)(1
1
1
(7.21)
From the average mean and covariance matrix vectors of each of the Ci classes, one can obtain a
Gaussian model for each of them (7.22). This model obtains the probability of membership of
each feature vector to each of the classes. Equation (7.22) shows the probability of membership
of the feature vector H to each of the previously modelled classes:
−∑−− −
∑=∑Ν
)()(2
1
2/12/
1
)·2(
1), (
iCT
iC
i
iie
C
DCC
µHµH 1
µHπ
(7.22)
Chapter VII Classification of spectral images and region analysis
Page 161
Assuming a Gaussian distribution of classes and the Bayes' theorem, one obtains that the decision
of the most plausible class shall be given by that which has the smallest cost. Analysing equation
(7.22), the most plausible class is determined by that whose Mahalanobis distance to the class
model is minimal, as shown in (7.23).
ijifiClass ii
T
ijj
T
j ≠∀−∑−>−∑−= −− ,)()()()( µxµxµxµx11
(7.23)
In this manner, each feature vector H(x,y) associated to each image pixel (x,y) is labelled as that
class whose Bayes' cost is least, thus obtaining a relation between each point of the image and its
most probable membership class.
This way, one obtains an image labelled B(x,y) which contains values ranging from zero to the
number of existing classes, the zero value assigned by rule to the class which represents the
background of the image, as shown in figure 7.8.
Figure 7.8 Labelling of the regions after the classification process
3. Analysis and merging of regions.
The previous section provided details on the obtaining of the label image B(x,y). This image
divides the image into several regions, each associated to a specific class. However, due to the
important overlapping that exists between different models, the presence of shine and the
dispersion between the elements that belong to it cause some of these regions to be erroneously
classified.
However, in the great majority of cases, these erroneously classified regions have a reduced size
and, generally, are found connected to bigger regions that are correctly classified, as shown in
Chapter VII Classification of spectral images and region analysis
Page 162
figure 7.9. Taking this observation as the basis, a procedure for merging connected regions is
proposed based on their statistical features.
Using this approach, the totality of connected regions are statistically analysed and compared
with the different class models in order to decide whether a specific region should be re-classified
or unified with one or several of the regions connected to it.
Figure 7.9 a) Initial classification of regions, b) Real classification of regions.
In order to achieve this, two differentiated grouping and reclassification procedures are proposed:
one based on the calculation of the Region of maximum likelihood, and one which uses the
Normalised region histogram to describe each of them.
Both methodologies reclassify each of the regions with a more precise estimate based on the
global estimate of membership of that region to a certain class, thereby increasing the statistical
relevance of the classification.
Subsequently, the regions that are connected between them are analysed in order to statistically
estimate whether they should be grouped together or not and what should be the final
classification of the aggregate supra-region, as shown in figure 7.10.
Figure 7.10 Estimate of the probability of membership to each of the regions and the membership
probability of the aggregate supra-region.
)( BAapertenenciP ℜ∪ℜ
)( AapertenenciP ℜ
)( BapertenenciP ℜ
Membership
Membership
Membership
Chapter VII Classification of spectral images and region analysis
Page 163
In the following, the two proposed methodologies are presented in order to for the reclassification
and merging of the regions.
3.1. Region of maximum likelihood.
The first approach uses the probabilities of membership obtained through neighbourhood
histograms ),( yxH , described in equation (7.19), with each of the defined Gaussian models, as
detailed in equation (7.22), to calculate a sole probability of membership for each of the regions.
This probability is calculated as the sum of the membership probabilities of the neighbourhood
histograms associated with each of the region points to each of the existing classes:
∑ℜ∈∀
∑Ν=ℜj
ii
yx
CCjimembership yxCP),(
), ),((),( µH (7.24)
In the same manner, one calculates the probability that defines the statistical cost of unifying both
regions:
∑ℜ∪ℜ∈∀
∑Ν=ℜ∪ℜBA
ii
yx
CCBAimembership yxCP),(
), ),((),( µH (7.25)
In the great majority of cases, we have a reduction of the probability of unification regarding the
probability of maintaining both regions separate. The reason for this is that independent regions
have greater similarity with the class with which they were originally classified, thereby causing a
decrease in the probability of membership to the aggregate region.
The decision on the unification of both regions is done via the application of an empirical factor
of correction µ which models the degree of decrease in the probability, thus unifying the region
in a situation where the probability of unified regions is greater than the statistical probability that
the regions remain separate in order to obtain an optimal cost:
ℜ+ℜ≥ℜ∪ℜ )(·)(··)( BmembershipB
AmembershipA
BAmembership PN
NP
N
NP µ (7.26)
Chapter VII Classification of spectral images and region analysis
Page 164
Where NA and NB are the number of elements that belong to regions A and B respectively, and N
the number of elements belonging to the aggregate region RAUB.
3.2. Normalised region histogram
The previous method makes an estimate of the reclassification and merging of the regions using a
weighting of the probabilities of membership of each of the points of the region. This approach
does not allow for the extraction of a single feature vector that can describe by itself its associated
region.
In order to overcome these limitations, a proposal is made for the extraction of a characteristic
descriptor of the region that defines specific region models that take into account its different
degrees of variability (shines, oxidations...), thus capturing and modelling variations that can not
be completely represented through the use of neighbourhood histograms defined in (7.19).
One of the fundamental advantages of this approach lies in that a previous calculation of the
neighbourhood histogram is not necessary for every point, but rather each region is classified
according to a sole histogram (vector) that represents that region. The fact that there is no need to
previously calculate each of the neighbourhood histograms for each point , under certain
conditions, greatly increases the computational speed.
In order to obtain this feature vector, a proposal is made to extend the concept of fuzzy
neighbourhood histogram to every type of region (as explained in Chapter VI). In this manner,
the neighbourhood histogram is calculated for the complete region instead of for the
neighbourhood of a point. This histogram characterises the whole region through a sole feature
vector of spatial and spectral properties.
In order to make the region histogram vector (6.17) independent from the number of elements
contained in each of the regions, this is normalised through a division by the number of N
elements of that region:
Ni
i
)()(ˆ
ℜ=ℜH
H (7.27)
Chapter VII Classification of spectral images and region analysis
Page 165
In this manner, a normalised region histogram (7.27) is calculated for each of the previously
classified regions, except for those regions classified as background. The fact that these
histograms may have been previously normalised allow for a direct comparison with the
associated Gaussian models related to each of the classes.
Fig. 7.11. Extraction of region histograms in each of the previously extracted regions.
Each pair of connected regions is analysed for a possible merging. In order to carry out this
operation, the region histogram is calculated for each of the regions (Ha,Hb) and another region
histogram is calculated for the aggregate region obtained from the two connected regions (Hab)
(figure 7.12).
Fig. 7.12 Region histograms of the two candidate regions and of the aggregate region.
Each of the histograms is verified against the models that define the existing classes and each
region is assigned to a class which has a greater probability of membership.
In the event that all three regions are assigned to a sole class, the candidate regions are merged
and the class assigned to that region is kept. Otherwise, each of the regions are statistically
analysed in order to decide whether two regions should merge or not. The probability of merging
are calculated through the membership probabilities associated to each of the region histograms
for each of the different classes, as defined in equation (7.28).
Ha
Hb Hab
Chapter VII Classification of spectral images and region analysis
Page 166
), ()( iiiii CHP ∑Ν=∈ µH (7.28)
The likelihood of the aggregate region (7.29) is contrasted with the likelihood of both regions
remaining separate (7.30)
), ()(abab CCabab CHP ∑Ν=∈ µx (7.29)
merged
bbb
merged
aaabbaa
N
NCHP
N
NCHPCHCHP )·()·(),( ∈+∈=∈∈ (7.30)
In order to quantify the contribution of each region to the equation (7.30), each term of the
equation is weighted by the number of elements belonging to the associated region (Na and Nb)
and the number of elements contained in the merged region ( Nab ).
In this way, that hypothesis that has a greater probability is chosen and the regions are either
merged or not in accordance with the hypothesis that bears a minimum cost.
4. Conclusions.
This chapter has presented a complete procedure that classifies the elements contained in
hyperspectral images. This process has been described in each of its phases and in-depth detail
has been provided for the possible approaches for each of the modules that make up this process.
First, the problem dealing with the dependence on the chromacity of the material from external
factors of lighting and geometry has been tackled. Several options have been proposed that
reduce this dependence under non-ideal conditions.
Second, those techniques presented in Chapters IV and V for spectral decorrelation have been
integrated within the proposed framework of classification. In this manner, the issue of the use of
several methods of decorrelation within the entire procedure of classification has been raised.
Chapter VII Classification of spectral images and region analysis
Page 167
On the other hand, the methodology based on fuzzy neighbourhood histograms that is defined in
Chapter VI is placed within the general framework of classification. In this manner, one can
achieve the integration of spectral and spatial features within a same feature vector.
Furthermore, a proposal is made for the use of the classification of materials through Gaussian
models. The properties of these materials will not only allow for their correct classification, but
also for the efficient evaluation of different alternatives that are proposed.
Additionally, and in order to increase the classification rate of the system, a proposal is made for
a later reclassification based on the integration of connected regions that have similar properties,
based on the optimisation of a unification cost function.
This approach describes each of the regions through a sole vector that integrates its spectral and
spatial features without needing to separately obtain the spectral and spatial features of each of
the points.
Chapter VII Classification of spectral images and region analysis
Page 168
Page 169
Chapter VIII
Results
Chapter VIII Results
Page 170
Previous chapters theoretically describe diverse methodologies for the classification of different
materials or elements in hyperspectral images in order to obtain a robust and efficient algorithm
for the classification of these elements.
This chapter aims to describe the different tests undertaken that validate these methodologies in
the field of the hyperspectral image classification, and specifically, for the classification of
materials from waste electrical and electronic equipment (WEEE).
In order to do so, first, a description is provided on the nature and conditions of the acquisition of
the sets of images used. Subsequently, details are provided on those tests that aim to establish the
efficiency of each of the different proposed methodologies and an analysis is provided on the
results that are obtained in order to validate or not the hypothesis made.
1. Description of the data sample
The validation of the different proposed classification algorithms is done in the context of the
classification of non-ferrous materials for recycling. Specifically, an evaluation is made of the
following materials from electrical-electronic waste: white copper, aluminium, stainless steel,
brass, copper and lead (Figure 8.1). An additional class is added to these materials, which is the
image background, made up by the conveyor belt on which the capture is made. This background
element (conveyor belt) has been previously chosen based on some specific luminous properties
that reduce its reflected intensity.
These material samples are made up of non-ferrous materials that are obtained after the process of
chopping, magnetic sorting and densiometric sorting. The set of resulting materials is composed
by a mixture of non-ferrous materials (aluminium, copper, zinc, brass and lead) and by austenitic
stainless steel, which represent 13% of all scrap from waste electrical and electronic equipment.
The great existing similarity in the chromaticity, shape and weight properties of these materials
makes their separation impossible through current methods except for manual sorting. The
samples used have been provided by and previously classified by expert operators belonging to
the enterprises Indumetal Recycling S.A. and IGE Hennemann Recycling GMBH, enterprises
which participate in the European project entitled SORMEN [SORM_06]. For this selection, the
Chapter VIII Results
Page 171
great variability in the appearance of the different materials to be classified has been taken into
account, as well as those real problems that exist in their classification.
Figure 8.1 Materials analysed in this study.
The hyperspectral images evaluated in this study have been acquired using a hyperspectral line
camera PHL Fast10 CL made by Specim Ltd. [SPEC_08]. Although this camera can capture 1024
wavelengths, due to the high correlation of adjacent wavelengths, only 80 wavelengths, equally
spaced, have been selected covering a spectral range between 384.05 and 1008.10 nm.
For the acquisition of these materials, a machine vision system (Figure 8.2) has been used. This
system integrates the lighting and data acquisition and synchronisation devices which are optimal
for allowing the correct lighting of those wavelengths that are going to be acquired by the spectral
camera at a reasonable speed.
Fig. 8.2 Image acquisition system
Chapter VIII Results
Page 172
In this manner, one can acquire a set of hyperspectral images that associate each of its pixels
with a spectral vector made up of eighty components, which represents the luminous spectrum of
each material. Figure 8.3 shows the pseudo colour visualisation of a set of materials acquired by
this system.
Fig. 8.3 Acquisition samples of diverse materials
These materials bear similarities not only in their luminous appearance (colour) but also present a
high scattering in the luminous spectra associated to each of the classes (figure 8.4). This fact
causes that these materials can not be classified according to their luminous spectrum by using
the classical techniques used in spectral classification.
Fig. 8.4 Spectral scattering between the different materials (blue: aluminium, copper: red, brass:
green, lead: cyan, steel: magenta, white copper: yellow).
Wavelength
Intensity
Chapter VIII Results
Page 173
For performing the following tests, the set of images has been divided into two groups. The first
group has been used as training and the second has been chosen as a test set. The number of
available elements, both in the set of training elements as well as the test set, is near the half-
million elements for each set.
8,000 representative pixels were chosen from the training set, which are going to be used to
generate each of the models of the materials to be classified. In the real tests, a first classification
was made using these 8,000 representative pixels, then a second classification was made using the
rest of the pixels contained in the training set and a third phase used the elements contained in the
test set.
In order to keep a simple and clear presentation, the results analysed in this chapter correspond
with those obtained on the use of the test set, as this set is the most restrictive and that which
offers worst classification rates.
Figure 8.5 shows the set of images used to illustrate the undertaken tests and provides details on
the type of material and the correct classification associated to each of the pixels.
Fig. 8.5 Description of the test images. Left: Correct classification, Right: Original image.
Stainless steel Lead Aluminium Brass White copper Copper
Chapter VIII Results
Page 174
In the following sections, an evaluation is made on the different alternatives in processing that
were previously defined in Chapter VII, such as lighting correction, spectrum decorrelation for
background segmentation and for material classification, efficiency in the inclusion of spatial
characteristics in the spectral model, as well as the efficiency in region merging techniques,
among others.
2. Background identification
Figure 8.6 shows the great luminous scattering between different materials and the background
element (in black colour). This differentiating value is going to separate the image background
from the rest of materials in a simple and computationally efficient manner through a process of
background subtraction.
Fig. 8.6 Existing scattering caused by the different spectra (steel, brass and aluminium). The
background in black colour.
The objectives of this preliminary background subtraction of the image are two-fold. On the one
hand, the value of the average intensity of the spectrum is a factor that adds a very important
discriminating value in order to differentiate between the background element and the diverse
materials, given the conditions of a black body without shine that characterises the former.
However, this is a damaging feature for its use in the discrimination between the remaining
materials. Due to the specular physical properties of these materials, the value of the average
intensity is affected by the lighting conditions and the geometry of the object itself (shines,
Wavelength
Intensity
Chapter VIII Results
Page 175
shadows...) [SHAF_84]. This makes the approach based on the preliminary subtraction of the
background based on average intensity features to be suitable, as it both eliminates this variable
and optimises the process of classification of diverse materials through the normalisation of the
lighting without affecting the background segmentation.
On the other hand, the high degree of discrimination between the background and the material
obtains an algorithm of classification that is much more efficient and of low computational cost
by using a reduced number of components and provides greater real velocity of the whole system.
In order to proceed to the background segmentation, one must first analyse the discriminating
value of the spectrum's intensity. For this, a raw vector of the spectrum XRAW is used as the
feature vector, as defined in equation 7.14 of Chapter VII.
We apply to this vector the lighting corrections proposed by Montoliu (7.12) and Stockman
(7.13), which are characterised for being able to eliminate the incidence of lighting. This is done
in order to evaluate the effects of this correction in the discrimination of the background. These
corrections annul the discriminating power of the spectrum's average intensity due to their
dependence on geometric factors that define lighting (see Chapter VII, section 3.1). This leads
one to believe that the use of this correction will be detrimental to the background segmentation
which is characterised by a dark intensity.
Table VIII.1 shows the results obtained in which one can truly observe that the average intensity
of the spectrum does indeed provide discriminant information regarding the background
segmentation.
However, the use of vector XRAW as a feature vector is not the most suited due to several reasons.
Amongst them, one can emphasize its high dimensionality, which causes a poor and inadequate
TABLE VIII.1 EFECTS OF LIGHTING CORRECTION IN THE BACKGROUND SEGMENTATION
Without correction Stockman Montoliu
81.92% 59.15% 69.10%
Chapter VIII Results
Page 176
training due to the Hughes Phenomenon and, on the other hand, entails a high computational cost
due to the great number of components involved in this calculation.
In order to select the feature vector which best adapts to the desired requirements for the
separation of the background-material, an analysis is made on the precision of the different
variables of spectrum decorrelation defined in chapter VII.3.2.
Bearing in mind the great similarity of results and in order to enhance the processing speed, one
can notice that the method based on XRGB obtains a precision of over 97% when discriminating
between background elements and those elements that correspond with materials. Table VIII.3
shows in detail the goodness of the classification obtained by applying it to all the image pixels,
showing that in over 600,000 background elements, only 31,000 were classified as materials and
that in 400,000 pixels of materials, less than 20,000 were erroneously classified as background,
obtaining a global precision of over 95%.
TABLE VIII.3 CONFUSION MATRIX FOR A CLASSIFICATION BASED ON XRGB
Real Element
Classifed as:
Background Material
Background 575,126 19,149
Material 31,734 396,797
TABLE VIII.2 COMPARISON OF THE DIFFERENT ELEMENTS OF SPECTRAL DECORRELATION FOR THE DIFFERENTIATION
BETWEEN THE BACKGROUND AND THE MATERIAL
Decorrelation Method
Number of Components
XRAW XRGB XFISHER XPCA XFuzzysets
2 98.32% 97.19%
3 97.15%
4 97.13% 96.67%
6 97.36%
8 88.94% 86.63%
16 89.39% 88.95%
24 87.70% 90.14%
80 81.92%
Chapter VIII Results
Page 177
Figure 8.7 shows real images as well as the obtained background/material classification. These
results can be improved with morphological erode techniques [GONZ_08] in order to eliminate
those erroneously classified elements. The application of these techniques does not influence the
global classification level that is done afterwards between the different materials, in that the size
of the erroneously classified pixels is several magnitudes smaller than the size of the objects. The
application of these morphological operations eliminates small objects and obtains a
background/material segmentation close to 100%. Figure 8.7 shows the classification obtained
without the use of morphological techniques.
Fig. 8.7 Background classification based on XRGB spectral feature.
Above: original image, Below: classification mask.
TABLE VIII.4 CONFUSION MATRIX FOR A CLASSIFICATION BASED ON XRGB
Real Element
Classified as:
Background Material
Background 94.77% 4.60%
Material 5.23% 95.40%
Chapter VIII Results
Page 178
A conclusion to be emphasized in this section is the great separability that exists between the
background and the rest of the materials. This correctly separates the background and materials
with a reduced computational cost through the use of Gaussian modelling of the background and
of each of the materials only using the colour wavelengths (R,G,B).
The use of XPCA and of XFuzzy sets is discarded due to the fact that although the use of two sole
components is enough, it requires a previous calculation process that includes information of the
complete spectrum. In this manner, one can take advantage of the discriminating power of the
average intensity of the spectrum while speeding the total time for the computation. Those pixels
classified as materials shall be re-classified through techniques to be described in the following.
3. Influence of the correction of lighting for the classification of
materials
The earlier section has shown the discrimination capacity of the average intensity of the spectrum
for the differentiation between materials and background. However, this average intensity, as
detailed in Chapter VII, section 3.1, is influenced by elements which have nothing to do with the
composition of materials, such as the level of the incident light, specular nature of the object and
its geometric orientation.
This section shows the different results obtained in the classification of materials when varying
the method of lighting correction employed, which corrects the influence of specular regions and
the incidence of light in the classification. However, it maintains the discriminating power
contained in the spectral vector.
Despite still not having reviewed the different methods of spectrum decorrelation (section 6), or
the results obtained through the merger of the spectral characteristics, and in order to present in
an orderly manner the different factors that influence the classification, an estimate is going to
made on the efficiency of the different methodologies of lighting correction using the method of
decorrelation based on the Energy of fuzzy sets (XFuzzy sets). This decorrelation method is that
which offers best results as shall be shown in the following sections of this chapter. The
classification is done in two phases in order to correctly estimate whether there is or isn't a
Chapter VIII Results
Page 179
goodness of the different methods of lighting correction. In the first, only using the spectral
information and in the second using a combination of the spectral and spatial information. The
results obtained are presented in table VIII.5.
These results show an improvement in the image classification due to the correction of the
lighting done through the methods proposed by Stockman [STOC_99] and Montoliu
[MONT_05]. Figure 8.8 shows how the Stockman as well as the Montoliu methods reduce the
effect of existing shines and shadows.
Figure 8.8 Visualisation of the effects of the application of correction algorithms in an image's Xj.
a) Without correction, b) Stockman's correction, c) Montoliu's correction.
Although both methods correct the effects of lighting, Stockman's method most effectively keeps
the differences in existing intensity in the expression of the different components Xj which are
related with the composition of materials. This way, there is no reduction of the discriminating
power between different classes, which corroborates the better results obtained by the Stockman
method and which will be used as the standard method of correction of lighting in the following
sections.
4. Decorrelation of the luminous spectrum
TABLE VIII.5 EFFECTS OF LIGHTING CORRECTION IN THE CLASSIFICATION OF MATERIALS
Method of Correction
Method of decorrelation
Without correction Stockman Montoliu
FuzzySets-8 50.29% 71.52% 55.45%
FuzzySets-8 + Neighbourhood
70.74% 90.22% 84.46%
Chapter VIII Results
Page 180
This section aims to compare the goodness of the different methodologies employed to
decorrelate and extract the discriminating features from a spectral vector. In this context, an
evaluation is made on the obtained classification results through the use of RGB components,
whole spectrum (RAW), as well as methodologies based on the Principal Component Analysis
and those based on the Energy of a fuzzy spectrum, all of them described in section VII.3.2.
An evaluation is also made on the influence of the number of components in the precision of the
classification for those methods that involve the selection of a certain number of components,
such as the principal component analysis and the method based on the Energy of fuzzy triangles.
These X vectors are used as feature vectors to generate Gaussian models that define each of the
materials, as described in Chapter VII section 3.4. In order to obtain these results and after the
background subtraction, the method for the correction of lighting proposed by Stockman has been
used, as this is the one which has better results of classification, as seen in the earlier section.
The classification obtained through the use of the components based on RGB XRGB (42.26%) was
surpassed by the use of the complete spectral vector XRAW (55.67%). Its high dimensionality
together with its high correlation increases the classification through the use of decorrelation
methods.
TABLE VIII.6 COMPARISON OF THE DIFFERENT ELEMENTS OF SPECTRAL DECORRELATION FOR THE DIFFERENTIATION OF
DIVERSE MATERIALS
Decorrelation Method
Number of Components
XRAW XRGB XFISHER XPCA XFuzzysets
2 53.08% 52.76%
3 43.83%
4 61.40% 63.10%
5 62.86%
8 66.43% 71.52%
16 64.11% 71.43%
24 67.95% 71.67%
80 55.67%
Chapter VIII Results
Page 181
Although the methods base on previous training (XPCA, XFISHER) provide better results than the use
of earlier methods, the best results are those obtained through the method based on the Energy of
fuzzy sets XFuzzy sets, which obtain results greater than 70%.
Table VIII.7 shows the confusion matrix of different materials:
The experimental results described in Table VIII.6 and shown in figure 8.8 indicate that both the
methods based on PCA as well as the methods based on fuzzification of the spectrum provide
promising results when used for the decorrelation of hyperspectral data. In the experiments
performed, the use of fuzzy sets surpassed the classification rate obtained through PCA while at
the same time, avoided complications associated with the training process that is required by
PCA. For these reasons, we reach the conclusion that the technique based on these fuzzy sets
seems to be the most appropriate method of decorrelation.
TABLE VIII.7 CONFUSION MATRIX FOR THE CLASSIFICATION OF MATERIALS USING SPECTRAL INFORMATION EXTRACTED FROM
SPECTRAL FUZZY SETS OF 8 COMPONENTS
Real Material
Classified
Aluminium Copper Brass Lead Stainless Steel
White Copper
Aluminium 75.73% 0.79% 1.95% 1.65% 22.40% 0.85%
Copper 0.22% 94.44% 4.01% 0.71% 5.81% 2.84%
Brass 2.54% 2.38% 70.37% 1.42% 10.12% 13.08%
Lead 8.91% 0.00% 8.13% 86.52% 25.49% 3.98%
Stainless Steel
11.23% 0.00% 7.03% 8.75% 29.14% 6.35%
White Copper
1.37% 2.38% 8.50% 0.95% 7.05% 72.89%
Average 71.52%
Chapter VIII Results
Page 182
Figure 8.9 Classification of diverse materials through the use of fuzzy sets.
Both figure 8.9 as well as table VIII.7 show a high level of statistical overlapping in the different
models of the class, which makes its classification not be precise enough. In order to observe this
phenomenon, figure 8.10 shows the level of statistical overlapping of different classes.
Figure 8.10 Graph of overlapping of classes using 3 of the Fisher's projections. (blue: aluminium, copper: red, brass: green, lead: cyan, steel: magenta, white copper: yellow).
Chapter VIII Results
Page 183
In this manner, the high overlapping between different materials can be seen, which in turn,
explains the low classification rate that is obtained.
Figure 8.11 shows the feature vector that is created after decorrelating the spectrum. These graphs
show that the level of overlapping has been reduced in relation with the existing level before the
decorrelation (figure 8.4). Furthermore, one can see that this phenomenon is visually more
evident through the use of fuzzy sets than through the use of PCA.
Figure 8.11 Spectral scattering between the different materials (blue: aluminium, copper: red,
brass: green, lead: cyan, steel: magenta, white copper: yellow). a) PCA, b) Fuzzy sets
In order to correct the remainig overlapping, we propose to include those additional features that
increase the separability between materials in order to obtain an improvement in the achieved
rates of classification.
5. Integration of spectral and spatial features
The experiments described in the earlier section did not include spatial information for the
process of feature modelling of non-ferrous materials. The inclusion of this information in the
generation of feature vectors can help increase the statistical separation between different
materials and therefore achieve an improvement in the classification. For this, the neighbourhood
Number of fuzzy sets
Intensity
Number of PCA
Intensity
Chapter VIII Results
Page 184
histograms are calculated for each of the pixels in the image, as shown in Chapter VII, section
3.3.
A variation of the size of the neighbourhood used for the calculation of fuzzy histograms is made
in order to effectively evaluate the efficiency of the merger of spatial and spectral information.
This allows to model the degree of spatial information contained in the vector, thus being able to
estimate the influence of spatial information in the classification. The experimental data obtained
is shown in table VIII.8.
The results shown in table VIII.8 indicate that the development of a feature vector that
simultaneously includes spectral and spatial information generates more robust image descriptors
that elaborate more precise models of those materials to be classified. The low rate of
classification obtained through the use of RGB image features indicate that the use of spatial
information without taking into account spectral information does not efficiently increase the
separability of materials.
The results obtained from table VIII.8 also indicate that the decorrelation techniques based on
fuzzification of the spectrum provide more consistent results than those based on the Principal
Component Analysis. This experimental data lead to conclude that the inclusion of the spectral-
spatial features for the classification of materials are adequate, as shown by an increase of over
86% in the classification rate. The fact that these fuzzy sets are inspired on the functioning of the
human eye makes possible the representation, in an optimal manner, of the spatial properties and
texture contained in those images, as is done by the human being.
TABLE VIII.8 RATES OF CLASSIFICATION THROUGH THE INTEGRATION OF SPECTRAL AND SPATIAL FEATURES
Window size 8-PCA 8-Fuzzysets RGB
3x3 67.19% 77.30% 53.03% 5x5 73.77% 81.48% 56.14% 7×7 76.42% 83.77% 56.46% 11×11 78.85% 85.62% 57.76% 15×15 79.34% 86.45% 59.21% 19×19 80.41% 86.63% 59.83%
23×23 81.32% 86.42% 59.90%
Chapter VIII Results
Page 185
Taking a closer look at the results obtained for each material (table VIII.9), one can see that the
level of erroneous classifications has been considerably reduced and that two materials (stainless
steel and white copper) are those which create the majority of erroneous classifications.
If the separation of the different classes of materials is analysed through the visualisation of
Fisher components extracted from a feature vector that defines the histogram (Figure 8.12), one
can observe that the separability of classes has considerably increased in comparison with that
obtained through the use of only spectral information.
Figure 8.12 Graph of class overlapping using Fisher's projections. a) Visualisation of the 1st, 2nd
and 3rd component. b) Visualisation of the 3rd, 4th and 5th component. (blue: aluminium, copper: red, brass: green, lead: cyan, steel: magenta, white copper:
yellow).
TABLE VIII.9 CONFUSION MATRIZ FOR THE CLASSIFICATION OF MATERIALS THROUGH SPECTRAL-SPATIAL INFORMATION
EXTRACTED FROM SPECTRAL FUZZY SETS OF 8 COMPONENTS AND A 19 X 19 NEIGHBOURHOOD
Real Material
Classified
Aluminium Copper Brass Lead Stainless Steel
White Copper
Aluminium 87.25% 0.06% 1.24% 0.14% 11.24% 0.29%
Copper 0.00% 98.32% 2.34% 0.00% 0.01% 2.87%
Brass 1.61% 0.00% 90.61% 5.51% 8.98% 15.69%
Lead 0.16% 0.00% 0.32% 90.79% 2.46% 0.00%
Stainless Steel
10.97% 1.61% 5.48% 3.56% 75.90% 4.25%
White Copper
0.00% 0.00% 0.00% 0.00% 1.41% 76.91%
Average 86.63%
Chapter VIII Results
Page 186
Looking at figure 8.12 in greater detail, one can notice that in fact, as shown in table VIII.9, a
high overlapping between classes of white copper (in yellow colour) and brass (in green colour)
occurs that cause that over 15% of white copper be classified as brass. On the other hand, one can
notice the great scattering of stainless steel (in magenta colour), which causes errors in the
classification in the remainder of classes.
Figure 8.13 shows the feature vectors that are generated. An in-depth analysis of the results
shows that each of the materials has a similar signature. Also, the separability between the
different materials has increased. This phenomenon coincides with the results of table VIII.9 and
scattering graph 8.12.
Figure 8.13 Spectral-spatial scattering (fuzzy histograms) between the different materials (blue:
aluminium, copper: red, brass: green, lead: cyan, steel: magenta, white copper: yellow).
Figure 8.14 shows the results of this classification and the effects caused in the classification by
these overlappings.
Component
Intensity
Chapter VIII Results
Page 187
Figure 8.14 Classification of the diverse materials through the use of fuzzy sets and
neighbourhood histograms of size 19 x 19 pixels.
The results in 8.14 show a correct classification for most materials. It is noteworthy to mention
that the main cause for an incorrect classification is due to the overlapping between the classes of
stainless steel and white copper with the rest of the classes. Figure 8.14 shows that the majority of
the erroneously classified areas are connected with regions that are correctly classified. This leads
us to evaluate the use of techniques which will allow the re-classification and merging of regions
that are interconnected in order to obtain a higher precision in the classification.
6. Methods of region merging
Those regions identified after the application of neighbourhood histograms (see section 5) are
subject to the process of re-classification and aggregation of regions mentioned in section VII.4.
In this procedure, the cost of addition associated to each of the pairs of regions is calculated and
adequate aggregation decisions are taken based on it.
Section VII.4 proposes two methodologies for the implementation of this merging of regions. The
technique based on the calculation of the region of maximum likelihood uses the distances of each
of the elements of the region to a previously established region model in order to obtain the most
probable classification. The method based on the calculation of normalised region histograms
extracts a signature based on the histogram of different regions and analyses, based on this
signature, whether its grouping is suitable or not.
Chapter VIII Results
Page 188
Table VIII.10 shows that both methodologies offer high levels of precision, surpassing 98% of
reliability, which allows one to state that the application of the process of region merging reduces
to a great extent the number of erroneously classified regions.
On the other hand, one notices that the use of spectral fuzzification (XFUZZY SETS) continues to
provide better results than the use of principal components. This allows us to state that the
decorrelation based on the spectrum fuzzification appears to be the ideal decorrelation method
versus the decorrelation based on PCA.
Both region merging methodologies offer similar results that do not allow one to state which of
these methodologies has better behaviour in the classification of these materials. However, the
use of region histograms provides a signature available to directly analyse the region without
making an extraction of neighbourhood histograms for each of the points, which in certain
situations results in a better computation time.
The detailed results of this classification are shown in table VIII.11, calculated using spectral
fuzzy sets on a neighbourhood of 11 x 11. The obtained results have been reprocessed through
region of maximum likelihood techniques. The result of the application of this technique shows
that the overlapping levels have been considerably reduced. It only has errors in classification due
to the great scattering that is present in the class that defines stainless steel, as shown in figure
8.12.
TABLE VIII.10 CLASSIFICATION RATES THROUGH THE APPLICATION OF FUZZY REGION METHODOLOGIES
Window size used for the previous
calculation of regions Region of maximum likelihood Normalised region histogram
8-PCA 8-Fuzzysets 8-PCA 8-Fuzzysets
3x3 86.16% 96.92% 75.16% 96.92% 5x5 93.34% 97.42% 93.17% 96.67% 7×7 94.44% 97.52% 94.44% 98.36% 11×11 95.55% 98.47% 92.84% 98.36%
15×15 94.57% 98.20% 92.59% 98.36%
19×19 94.13% 96.78% 92.89% 96.94% 23×23 91.93% 96.96% 92.51% 96.94%
Chapter VIII Results
Page 189
The final results of the classification are shown in figure 8.15 where the effects of the processes
of region merging can be observed.
TABLE VIII.11 CONFUSION MATRIX FOR THE CLASSIFICATION OF MATERIALS THROUGH THE USE OF SPECTRAL-SPATIAL
INFORMATION FROM SPECTRAL FUZZY SETS OF 8 COMPONENTS AND A NEIGHBOURHOOD OF 11 X11 AND REGION MERGING BASED ON MAXIMUM LIKELIHOOD
Real Material
Classified
Aluminium Copper Brass Lead Stainless Steel
White Copper
Aluminium 97.61% 0.00% 0.03% 0.00% 0.00% 0.00%
Copper 0.00% 98.39% 0.00% 0.00% 0.00% 0.00%
Brass 0.00% 0.00% 97.51% 0.00% 0.00% 0.00%
Lead 0.00% 0.00% 0.00% 98.89% 0.00% 0.00%
Stainless Steel
2.39% 1.61% 2.45% 1.11% 98.41% 0.00%
White Copper
0.00% 0.00% 0.00% 0.00% 1.59% 100.00%
Average 98.47%
Chapter VIII Results
Page 190
Figure 8.15 Classification of several materials through the use of fuzzy sets and neighbourhood
histograms of size 11 x 11 pixels and the later reclassification based on the calculation of the
region of maximum likelihood.
Figure 8.15 shows the correct classification of the majority of regions. However, the effect caused
by the dispersion of aluminium continues to be seen and is what causes all the erroneous
classifications.
Despite this, spectacular results are achieved in the classification of materials, with results of over
98%, which validate the use of region merging criteria in order to improve in the classification.
7. Conclusions
This chapter has verified, based on experimental results, the theoretical hypothesis proposed in
earlier chapters.
Chapter VIII Results
Page 191
First, a computationally efficient methodology that subtracts the image background with great
precision has been established, thus allowing for an effective background segmentation.
On the other hand, a selection has been made of that correction of lighting technique which
behaves most efficiently in order to reduce the effects of the variation in lighting and the
specularity that is inherent to these materials. In this manner, a reduction is obtained, to a great
extent, of the effects caused by shine, areas of shadow and intensity of the spectrum, and thus
increasing the rate of classification.
Furthermore, as mentioned in Chapters IV and V, we have verified that the decorrelation of the
spectrum based on fuzzy sets as proposed in Chapter V obtains better results than the use of
classical techniques, such as those based on the principal components analysis and others.
Additionally, the use of these fuzzy sets for spectral decorrelation does not raise the problems of
traditional methods regarding both the need for a correct training as well as problems associated
with the loss of representation, discriminating power and chaotic behaviour when new non-
trained materials are introduced to the system.
The statistical overlapping of these analysed materials has been resolved by means of the
incorporation of spatial information to the spectral feature vector through the use of fuzzy
neighbourhood histograms proposed in Chapter VI, thus considerably increasing the separability
of different materials.
It also shows that the combination of spectral and spatial information in a simultaneous manner is
necessary. The use of only spectral information does not obtain correct classification rates over
72%. In the same manner, the use of spatial information on the colour features of the image (RGB
components) does not reach 60%. However, the combined use of both methods obtain
classification rates of over 86%.
Last, we have verified that the methodology of reclassification and merging of regions, proposed
in Chapter VII reduces the number of erroneously classified regions based on the calculation of
the criteria of cost of unification of regions, obtaining a success rate greater than 98%. Figure
8.16 shows in detail the final classification obtained for the set of data shown in this document.
Chapter VIII Results
Page 192
Figure 8.16 Classification of diverse materials through the use of fuzzy sets and neighbourhood
histograms of size 7 x 7 pixels and the later reclassification based on the calculation of
normalised region histograms.
This figure shows that the existing remaining erroneous classification is due to a high degree of
existing dispersion in the stainless steel model. This fact highlights the need to work in the study
and development of a more advanced model which will allow a more precise estimate of the
stainless steel model.
In summary, in this chapter, a high number of experiments have been performed that have
demonstrated that the use of spectral or spatial techniques in an independent manner do not obtain
an adequate model for the classification of these materials that are being studied. This makes the
joint use of these techniques necessary. Furthermore, the improvement in the precision of the
system based on different methodologies proposed in earlier chapters has been measured. In this
manner, the initial rate of classification, of 43.83% has increased to 98.47% through the use of the
proposed methodologies.
Page 193
Chapter IX
Conclusions, contributions and future work
Chapter IX Conclusions, contributions and future work
Page 194
In this Thesis work, different key aspects related with the characterisation, segmentation and
classification of hyperspectral images have been dealt with.
The studies undertaken in different sections have led to specific conclusions for each of the
matters dealt with. Based on the acquired knowledge in these researches, this chapter sets out a
series of general conclusions that summarise the work undertaken in this Thesis.
The in-depth study of each of the themes that make up this Thesis, a review of the state of art and
of the physical properties that underlie the creation of hyperspectral images have allowed to
delimit and improve certain aspects where conventional techniques did not produce the desired
results. This has led to new approaches for tackling existing problems. These new approaches
validated both theoretically and experimentally, have led to a series of contributions that shall be
listed in this chapter.
Additionally, during the development and evaluation of the different proposed solutions,
numerous areas of work have been opened that have not been analysed in depth in this Thesis.
This has mainly been done in order to avoid excessive dispersion in the themes we have dealt
with. The work themes presented in this Thesis are the starting point for new lines of research
which shall be object of further research. The future works are listed in a condensed form at the
end of this chapter.
1. Conclusions
First, there has been a verification that the proposed methodology leads to a spectacular
improvement in the ratios of classification versus traditional techniques by obtaining an
improvement of over 100% in the ratios of classification when these methods are used in the
context of the classification of metallic materials. It also reduces the error obtained by traditional
techniques (greater than 56%) in a considerable manner through the application of the proposed
methodology (less than 2%).
This methodology not only classifies diverse materials, but also constitutes a theoretical
framework that integrates spectral and spatial features in a sole mathematical descriptor for the
Chapter IX Conclusions, contributions and future work
Page 195
characterisation of elements and regions contained in hyperspectral images, irrespective of their
nature.
The use of this spectral-spatial descriptor has been verified to achieve classification rates (87%
without applying merger of region techniques) that are much greater than those obtained through
the use of spatial techniques based on RGB colour (60%) and those obtained through the isolated
use of the spectral information of a pixel without taking spatial information into account (56%).
The use of a bioinspired method of feature extraction based on fuzzy sets (hyperspectral eye)
improves, by taking advantage of the correlation between near bands, the compression and
extraction of spectral descriptors. This allows for a better spectral characterisation that
simultaneously avoids the Hughes Phenomenon. This way, the use of this technique, when
applied to the classification of metals, obtains classification results of 72% using only spectral
information versus 56% obtained by using raw spectrum or 68% obtained through the use of PCA
or other classical methods of decorrelation.
The use of Gaussian models for the definition of materials based on extracted descriptors has an
efficient behaviour that directly improves the classification. Furthermore, due to the simplicity
and mathematical properties of these Gaussian models, this has led to the development of a
reclassification method of regions that increases the precision of the system in a very efficient
manner (98%).
In order to allow a better and easier in-depth analysis of the results of this Thesis, the following
provides the different conclusions reached in each of its chapters.
First, Chapter II lists the different types of existing images, including hyperspectral images, as
well as the physical properties that intervene in their creation.
On the other hand, Chapter III analyses the classical metrics used for the quantification of
dissimilarities between two given spectra. This analysis shows that these metrics efficiently
analyse the global changes in the spectrum; however, numerous times, it does not correctly
capture the discriminating features between the different classes. Due to this, the different
approaches for the extraction of descriptors in high-dimensional data are analysed in chapter IV.
There has also been an analysis of the state of the art methods which reduce the Hughes
Chapter IX Conclusions, contributions and future work
Page 196
Phenomenon while keeping the discriminating information between the different clases. These
methods model those local features that ease its discrimination.
The analysed methods (PCA, LDA...) that allow a more efficient correction of the Hughes
Phenomenon simultaneously keep the discriminating power between different clases. However,
they also have a set of associated disadvantages. This way, the need for a previous training for its
calculation and the distortion produced in the physical meaning of the transformed variables
makes the analysis of the extracted descriptors more difficult. Additionally, the changes in the
composition of classes to be analysed or the addition of new classes to the system cause changes
in the optimal feature subspace, making it unsuitable and requiring a retraining that causes
uncontrollable modifications in the descriptors that are defined by the previous subspace.
The analysis of the limitations of these and other methods are set out in Chapter IV. Next,
Chapter V presents the requirements that must be complied with by the extracted descriptors.
These requisites imply the reduction of the Hughes Phenomenon, a universality and non-
dependence on the set of training data as well as a theoretical-physical basis that upholds the
discriminating power of the desired optimal descriptor.
Taking as a basis the previous requirements, Chapter V additionally proposes a methodology
based in the fuzzification of the spectrum. This method, bioinspired in the functioning of the
cones of the human visual system, takes advantage of the high correlation between adjacent bands
of the spectrum in order to define a method of feature extraction based on the Energy of the
spectral fuzzy sets.
In fact, in Chapter VIII, verifies through experimentation the better performance of this method
versus those methods of feature extraction that were described in Chapter IV. Additionally,
confirmation is provided of the fact that the fuzzification of the spectrum more efficiently retains
associated spatial information due to its similarity with the human visual system. Nonetheless,
despite the better performance of the proposed method of feature extraction, the discriminating
power of the spectral vector does not obtain an adequate rate of classification by itself. On the
other hand, in Chapter VI, a methodology is proposed that includes spectral features of an image
with spatial features in a sole feature vector through the use of fuzzy neighbourhood histograms
and fuzzy region histograms.
Chapter IX Conclusions, contributions and future work
Page 197
Moreover, in Chapter VI, we prove that this combination of spectral and spatial features in a sole
vector provides greater separability for statistically overlapped classes. Furthermore, the best
classification results are provided by the combination of the method based on the Energy of fuzzy
spectral histograms with fuzzy spatial histograms.
Chapter VII offers a global theoretical vision of the complete process that allows the optimal
classification of different materials from hyperspectral images. This chapter analyses the problem
of the dependence on material chromacity versus external factors of illumination and geometry,
proposing different options that reduce this dependence on non-ideal conditions. On the other
hand, the techniques of decorrelation proposed in Chapters IV and V are integrated in the process
of classification together with the methodologies for the integration of spatial and spectral
features proposed in Chapter VI. This integration makes the definition of a model for each
material based on a Gaussian distribution possible. This not only evaluates the goodness of the
different proposed methods of classification, but also creates a mathematically simple model for
the classification of materials that can be easily analysed and which is extremely robust.
Chapter VII shows the usefulness of the employment of a later re-classification based on the
integration of connected regions based on the optimisation of a unification cost function. This
unifies, through the analysis of Gaussian distribution of materials, physically connected regions
that have been erroneously assigned to different materials.
Finally, Chapter VIII verifies via experiments, the proposed methodologies in earlier chapters and
validates the hypotheses made.
2. Contributions
This work actively contributes to the advance of the state of the art in the field of processing and
extraction of discriminant features from hyperspectral images. Specifically, four main
contributions have been made:
A) The definition of a generic framework that can classify based on information contained in
hyperspectral images and which covers the whole process, integrating spectral and spatial
features in a simultaneous manner, optimising the extraction of the discriminating spectral
Chapter IX Conclusions, contributions and future work
Page 198
features based on a bioinspired model and creating a region model that reclassify the
erroneously classified regions.
B) A classification system for metallic materials based on the previous generic framework, which
is an advance in the current techniques of state of the art in the classification of materials.
C) A spectral-spatial descriptor that capture the spectral and spatial features in a sole feature
vector, thus more effectively characterising the content of hyperspectral images.
D) The definition and development of a system bioinspired on the human eye that processes a
spectrum through fuzzy logic techniques, allowing for a better extraction of the discriminating
features contained in that spectrum.
In order to describe in detail the different contributions of this work in a more exhaustive manner
a detailed list of the above is going to be provided in the following paragraphs.
First, Chapters II and III offer a global vision of the principles of the generation of hyperspectral
images as well as a review of the difficulties involved in the classification of materials based on
hyperspectral images. It also provides a detailed state of the art on the different classification
techniques. The following are the main contributions:
i. Conceptual review of the physical basis of the formation of hyperspectral images.
ii. In-depth study of the different problems associated with the classification of materials.
iii. Presentation of the classical methodology of classification for hyperspectral images.
iv. Review of the classical metrics for the differentiation of different spectra.
Chapter IV analyses the problems inherent to the extraction of discriminating features that define
the spectral vectors. The following contribution is provided:
v. Analysis of the problems associated to classical methods for feature extraction.
On the other hand, Chapter V provides an innovative method for the extraction of spectral
features that is bioinspired in the human visual system:
Chapter IX Conclusions, contributions and future work
Page 199
vi. Listing of the properties that must be met by an optimal feature vector: requirements of
dimensionality reduction, discriminatory power, non-dependence on the sample set,
based on differing physical properties, universality, generality and physical maintenance
of the variable.
vii. Description of the fuzzy spectrum concept as an ideal element for the extraction of
spectral features for the separation of materials in conformity with the earlier contribution
that is conceptually inspired in the human visual system.
viii. Mathematical definition of the Energy of the fuzzy set as a characteristic measure that
allows for the ideal discrimination between different spectra.
ix. Expansion from the fuzzy set concept to multi-frequency fuzzy set in order to permit the
extraction of discriminating information that belongs to several frequencies.
In turn, Chapter VI is dedicated to the integration of spectral and spatial features, and offers the
following contributions:
x. Integration of spectral and spatial features in a sole feature vector through the use of
spatial histograms.
xi. Definition of the quantization of the fuzzy histogram that reduces the problems associated
with classical discretisation of histograms.
xii. Definition of fuzzy neighbourhood histograms and of region which characterise the
spectral-spatial information associated to a point and its neighbourhood or to a region of
the image.
On the other hand, Chapter VII integrates earlier methodologies in a complete framework for the
classification of materials, the following being its main contributions:
xiii. Global description of the complete process for classification of materials.
xiv. Study of the influence of lighting in the classification process.
xv. Integration of the contributions vii-xii within the process of classification of materials.
xvi. Mathematical description of the model of materials based on a Gaussian statistical
model.
xvii. Description of a process of merging of regions based on their Gaussian statistics based
on the fuzzy region histogram.
Chapter IX Conclusions, contributions and future work
Page 200
Finally, the following are the contributions set out in Chapter VIII on the results obtained and the
description of the validation samples:
xviii. Experimental verification of the efficiency of the different proposed methodologies for
background subtraction.
xix. Experimental verification of the influence of the method of lighting correction in the
classification.
xx. Experimental verification of the inherent advantages of the use of fuzzy spectra for the
extraction of spectral features versus classical techniques, including a more precise
spatial representation.
xxi. Experimental verification of the improvement in the separability of different materials
that happens when integrating spectral and spatial features.
xxii. Experimental verification of the improvement in the classification caused by the use of
region merging techniques.
xxiii. Experimental verification of the validity of the proposed classification framework.
Taking the earlier into account, it is noteworthy that contributions i, ii, v, vii, viii, x-xii, xiii, xv-
xvii y xx-xxiii are pending acceptance in the scientific journal IEEE Transactions on Fuzzy
Systems in an exhaustive article entitled: “Spectral and Spatial Feature Integration for
Classification of Non-ferrous Materials in Hyper-spectral Data”.
At the same time, further development has been made on the bioinspired algorithm for feature
extraction (hyperspectral eye) whose results are pending acceptance in an article entitled “Bio-
inspired Data Decorrelation Methodology for Hyperspectral Imaging”, presented to Letters on
Pattern Recognition, Elsevier.
Furthermore, both the methodology for the extraction of bio-inspired spectral features as well as
the methodology for the integration of spectral-spatial features have been object of requests for
grant of a European patent: (Methodology for modeling electromagnetic spectra , EU-Patent:
08380314) and (Methodology for integration of spectral and spatial features for material
classification EU-Patent: 08380315).
Chapter IX Conclusions, contributions and future work
Page 201
3. Future work
The research hereby presented is aimed at providing contributions on a specific matter and could
not have an end, as any of the aspects which compose it are always capable of improvement.
This section lists those tasks that have not been undertaken in the scope of this Thesis and which
are being looked at now or in the near future.
At this time, within the European project SORMEN [SORM_06], an application of the results of
this Thesis is being used in a prototype for the classification of materials from waste electrical
and electronic equipment (WEEE). In order to do so, the conditions proper of the process are
being taken into account and which will transform and adapt the proposed classification
algorithm for its use in the recycling of these materials. Special emphasis is being placed on
factors such as the speed of the process and its algorithmic simplification, taking into account the
particularities of this specific application. This will give rise to an additional patent on the
recycling system as well as several scientific publications which will deal with the real time
algorithm of classification, among other matters.
Anyhow, both the techniques of feature extraction based on fuzzy sets, as well as those used for
the integration of spectral and spatial features can be improved through the use of multi-
frequency or multi-spatial approaches that include information of diverse frequencies or of
different sizes of neighbourhoods. This could lead to an additional increase in the separability.
However, the two most promising fields which are opened because of the results obtained in this
Thesis are related to the use of these methodologies in the field of image segmentation and its
application to other areas of knowledge.
In this manner, taking the Gaussian model as a basis that defines a spectral region, the aim is to
develop advanced segmentation techniques based on active contours and snakes [LEE_05] that
use the evolution of the Gaussian model of this region for the determination of its correct
segmentation both in two-dimensional as well as three-dimensional hyperspectral images.
Chapter IX Conclusions, contributions and future work
Page 202
Lastly, given the generality of the proposed methodology, the aim is to apply these segmentation
and classification technologies to other fields, such as medical images based on functional
magnetic resonance for the segmentation and classification of regions of interest, biological
analysis, identification of materials, quality control, advanced segmentation and quality control of
fruit within different projects.
Page 203
References
[ANGE_99] “E. Angelopoulou et Al.”, “Spectral Gradient: A material Descriptor Invariant To Geometry and Incident
Illumination”, Proc. 7th IEEE Int. Conf. Computer Vision, pp. 861-867, 1999.
[ASTER_98] ASTER Spectral Library. http://speclib.jpl.nasa.gov. , California Institute of Technology.
[BAKE_98] Baker M.R, “Universal Approximation Theorem for Interval Neural Networks”, Reliable Computing,
Volume 4, Number 3, pp. 235-239(5), Springer, 1998.
[BAYE_76] Patent US3,971,065 (1976-07-20) Bryce E. Bayer Color imaging array.
[BELL_61] R.E. Bellman “Adaptive control processes”, Princeton University Press.
[BELL_95] A. Bell and T. Sejnowski, “An Information-Maximization Approach to Blind Separation,” Neural
Computation, vol. 7, pp. 1,004-1,034, 1995.
[BERE_07] A. Bereciartua, and J. Echazarra, “Sistema basado en identificación multiespectral para la separación de
metales no férricos en WEEE en logísitica inversa”, 1er Congreso de Logística y Gestión de la Cadena de Suministro,
2007.
[BERG_85] Berger, J.O. Statistical Decision Theory and Bayesian Analysis. Springer Verlag, New York, Second
Edition.. ISBN ISBN 0-387-96098-8 (1985).
[BISH_06] CHRISTOFER M. BISHOP, “PATTERN RECOGNITION AND MACHINE LEARNING”, Springer,
2006, ISBN-10: 0-387-31073-8.
[BISH_08]. Bishop C. and I.T. Nabney “Pattern Recognition and Machina Learning: A matlab Companion”. Springer,
2008.
[BISH_95] C.M. Bishop, Neural Networks for Pattern Recognition. Oxford: Clarendon Press, 1995.
[BLINN_77] Blinn, J.F., Models of Light reflection for Compter synthesized Pictures, Computer Graphics 11 (2):192-
198,1977.
[BURG_98] C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and
Knowledge Discovery, vol. 2, no. 2, pp. 121-167, 1998.
[CE_02] Directive 2002/96/EC of the European Parliament and of the Council of 27 January 2003 on Waste Electrical
and Electronic Equipment (WEEE) - Joint declaration of the European Parliament, the Council and the Commission
relating to Article 9.
[CHAN_03] C.I. Chang, Hyperspectral Imaging: Techniques for Spectral Detection and Classification, Kluwer
Academic Publishers Group, ISBN:0-306-47483-5, 2003.
[CHAN_04] Chang, C.I.Hyperspectral Imaging: Techniques for Spectral Detection and Classification,2004, ISBN:0-
306-47483-2.
[CHENG_94] B. Cheng and D.M. Titterington, “Neural Networks: A Review from Statistical Perspective” Statistical
Science, vol. 9, no. 1, pp. 2- 54, 1994.
[CHERI_03] Cheriyadat A., Bruce L.M., “Why principal component analysis is not an appropriate feature extraction
method for Hyperspectral data”, IEEE Geoscience and Remote Sensing Symposium, IGARSS 2003
[CLAR_90] Clark, R.N., A.J. Gallagher, and G.A. Swayze, Material Absorption Band Depth Mapping of Imaging
Spectrometer Data Using a Complete Band Shape Least-Squares Fit with Library Reference Spectra, Proceedings of
the Second Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) Workshop. JPL Publication 90-54, 176-186,
1990
References
Page 204
[CLARK_93] Clark, R.N., G.A. Swayze, A.J. Gallagher, T.V.V. King, and W.M. Calvin, 1993, The U. S. Geological
Survey, Digital Spectral Library: Version 1: 0.2 to 3.0 microns, U.S. Geological Survey Open File Report 93-592, 1340
pages, http://speclab.cr.usgs.gov.
[CLARK_95] Clark, R.N. and Swayze, G.A., Mapping Minerals, Amorphous Materials, Environmental Materials,
Vegetation, Water, Ice and Snow, and Other Materials: The USGS Tricorder Algorithm. Summaries of the Fifth
Annual JPL Airborne Earth Science Workshop, January 23- 26, R.O. Green, Ed., JPL Publication 95-1, p. 39-40, 1995.
[COMO_94] P. Comon, “Independent Component Analysis, a New Concept? Signal Processing, vol. 36, no. 3, pp.
287-314, 1994.
[COVE_67] Cover, T. and P Hart, “Nearest neighbor pattern classification”, IEEE Transactions on Information Theory
IT-11,21-27 1967.
[COVE_74] T.M. Cover, “The Best Two Independent Measurements are not the Two Best”, IEEE Trans. Systems,
Man, and Cybernetics, vol. 4, pp. 116-117, 1974.
[COVE_77] T.M. Cover and J.M. Van Campenhout, “On the Possible Orderings in the Measurement Selection
Problem,” IEEE Trans. Systems, Man, and Cybernetics, vol. 7, no. 9, pp. 657-661, Sept. 1977.
[CUN_89] Y. Le Cun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, and L.D. Jackel,
“Backpropagation Applied to Handwritten Zip Code Recognition,” Neural Computation, vol. 1,pp. 541-551, 1989.
[DASA_91] Belur V. Dasarathy, “Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques”, 1991, ISBN
0-8186-8930-7.
[DEMP_77] Dempster et Al. “”Maximum likelihood from incomplete data via the EM algorithm”. Journal of the Royal
Statistical Society B39(1), 1-38. 1977
[DEVI_80] P. A. Devijver and J. Kittler, “On the edited nearest neighbor rule,” in Proc. 5th Int. Conf Pattern
Recogn., 1980, pp. 72-80.
[DEVI_82] P.A. Debijver and J. Kittler. “Pattern Recognition: A Statistical Approach”, London, Prentice-Hall, 1982.
[DJOU_97] A. Djouadi and E. Bouktache, “A Fast Algorithm for the Nearest Neighbor Classifier,” IEEE Trans.
Pattern Analysis and Machine Intelligence, vol. 19, no. 3, pp. 277-282, 1997.
[DREY_13] “J. L. E. Dreyer”, “Brahe, Tycho. Tychonis Brahe Dani Opera Omnia (in Latin). Vol 1-15” 1913-1929.
[DU_05] Du Peijun et Al. , „Error Analysis and Improvements of Spectral Angle Mapper (SAM) Model“, MIPPR
2005: SAR and Multispectral Image Processing, Proc of SPIE Vol. 6043, 60430L, (2005).
[DUDA_73] Duda, R. and Hart, P. (1973). “Pattern Classification and Scene Analysis”. John Wiley & Sons. ISBN 0-
471-22361-1. 1973
[EINS_1905] Einstein, A "Über einen die Erzeugung und Verwandlung des Lichtes betreffenden heuristischen
Gesichtspunkt (trad. Modelo heurístico de la creación y transformación de la luz)". Annalen der Physik 17: 132–148
[WIKI_08] Wikipedia contributors. Specular reflection. Wikipedia, The Free Encyclopedia. August 4, 2008, 11:27
UTC. Available at: http://en.wikipedia.org/w/index.php?title=Specular_reflection&oldid=229755522. Accessed August
13, 2008.
[FEATH_05] B.K. Feather, S.A. Fulkerson, J.H Jones, R.A. Reed, M. Simmons, D. Swann, W.E. Taylor, and L.S.
Bernstein, “Compression technique for plume hyperspectral images”, Algorithms and Technologies for Multispectral,
Hyperspectral and Ultraspectral Imagery XI, SPIE, 2005
[FLET_87] Fletcher R. “Practical methods of Optimization” (Secon ed.) Wiley.
[FREU_96] Y. Freund and R. Schapire, “Experiments with a New Boosting Algorithm,” Proc. 13th Int'l Conf. Machine
Learning, pp. 148-156, 1996.
References
Page 205
[FREU_98] Freund, Y. and Schapire, R. E. Large margin classification using the perceptron algorithm. In Proceedings
of the 11th Annual Conference on Computational Learning Theory, 1998.
[FRIE_87] J.H. Friedman, “Exploratory Projection Pursuit,º J. Am. Statistical Assoc., vol. 82, pp. 249-266, 1987.
[FUKU_83] K. Fukushima, S. Miyake, and T. Ito, “Neocognitron: A Neural Network Model for a Mechanism of
Visual Pattern Recognition”, IEEE Trans. Systems, Man, and Cybernetics, vol. 13, pp. 826-834, 1983.
[FUKU_84] K. Fukunaga and J. M. Mantock, “Nonparametric data reduction,” IEEE Trans. Pattern Anal. Machine
Intell., vol. PAMI-6, pp. 115-118, Jan. 1984.
[FUKU_89] K. Fukunaga and R. R. Hayes, “The reduced Parzen classifier,” IEEE Trans. Pattern Anal. Machine
Intell., vol. PAMI-11, pp. 42M25, Apr. 1989.
[FUKU_90] K. Fukunaga, “Introduction to Statistical Pattern Recognition, 2nd ed”. New York Academic, 1990.
[GAMB_04] P. Gamba et Al, “Exploitiing spectral and spatial information in hyperspectral urban data with high
resolution ”, IEEE Geosci. Remote Sensing vol 1 nº 4, pp 322326,2004.
[GAMB_07] Paolo Gamba, Antonio J. Plaza, Jon A. Benediktsson, Jocelyn Chanusshot, European perspectives in
hyperspectral data analisis, Recent Advances in Techniques for Hyperspectral Image Processing. Remote Sensing of
Environment, JCR 2007.
[GESI_06] A.J. Gesing, “ELVs: How they fit in the global material recycling system and with technologies developed
for production or recycling of other products and materials”, 6th International Automobile Recycling Congress,
Amsterdam, Netherlands, 2006
[GEUS_01] “Jal-Mark Geusebroek et Al.”, “Color Invariance”, IEEE Trans. Pattern Anal. Mach. Intell. Vol 23 Nº 12,
2001.
[GOME_02] Luis Gómez Chova, “Pattern Recognition Methods for Crop Classification from Hyperspectral Remote
Sensing Images”, Master Thesis, 2002.
[GONZ_08] R.C. Gonzalez, R.E.Woods, "Digital Image Processing, 3rd Edition, ISBN: 978-0-13-168728-8, Pearson,
2008
[GORB_07] A. Gorban, B. Kegl, D. Wunsch, and A. Zinovyev (Eds.), Principal Manifolds for Data Visualization and
Dimension Reduction, LNCS 58, Springer, ISBN 978-3-540-73749-0, 2007.
[GRAH_07] H. Grahn and P. Geladi (Eds.), Techniques and Applications of Hyperspectral Image Analysis, Wiley,
ISBN-10: 0-470-01086-X, 2007
[GUNT_82] Wyszecki, Günther; Stiles, W.S. (1982). Color Science: Concepts and Methods, Quantitative Data and
Formulae, 2nd ed., New York: Wiley Series in Pure and Applied Optics. ISBN 0-471-02106-7.
[HART_68] P. E. Hart, “The condensed nearest neighbor rule,’’ IEEE Trans. Inform Theory, vol. IT-14, pp. 515-
516, 1968.
[HAYK_99] S. Haykin, Neural Networks, “A Comprehensive Foundation. Second ed., Englewood Cliffs”, N.J.:
Prentice Hall, 1999.
[HEAL_99]G. Healey, and D. Slater, “Models and methods for automated material identification in hyperspectral
imagery acquired under unknown illumination and atmospheric conditions”, IEEE Transactions on Geoscience and
Remote Sensing, vol. 37, no. 6, pp. 2706-2717, 1999.
[HERT_91] Hertz, J. A. Krogh and R.G. Palmer, “Introduction to the theory of Neural Computation”, Addison Wesley,
1991.
[HOLL_03] Michael Hollas, Modern Spectroscopy, 4th Edition, ISBN: 978-0-470-84416-8, 2003
[HOTT_33] H HOTELLING, “Analysis of a complex of statistical variables into principal components”, Journal of
educational psychology 417-444,1933.
References
Page 206
[HUGH_68] G.F. Hughes, “On the Mean Accuraccy Of Statistical Pattern Recognizers”,“IEEE Transactions on
Information Theory” (14-1 55-63.), 1968
[JACO_91] R.A. Jacobset Al, “Adaptive Mixtures of Local Experts,” Neural Computation, vol. 3, pp. 79-87, 1991.
[JAIN_00] Anil K. Jain et Al. “Statistical Pattern Recognition: A Review”, IEEE Transactions on Pattern Analysis and
Machine Intelligence vol 22, Nº1 January 2000.
[JAIN_97] A.K. Jain and D. Zongker, “Feature Selection: Evaluation, Application, and Small Sample Performance,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 2, pp. 153-158, Feb. 1997.
[KASH_86] S.K. Kachigan, “Statistical Analysis” Radius Press, New York.
[KEEN_07] Michael R. Keenan, “Multivariate Analysis of Spectral Images Composed of Count Data”, Techniques and
Applications of Hyperspectral Image Analysis: 2007, ISBN-10: 0-470-01086-X, 2007.
[KESH_04] N. Keshava, “Distance metrics and band selection in hyperspectral processing with application to material
classification and spectral libraries”, IEEE Transactions on Geoscience and Remote Sensing, vol. 42, no. 7, pp. 1552-
1565, 2004.
[KLIN_90] “Gudrun J. Klinker, Steven A. Shafer, Takeo Kanade”, “A Physical Approach to Color Image
Understanding”, International Journal of Computer Vision. 4, 7-38, 1990.
[KOHO_88] T. Kohonen, “Learning vector quantization,” Neural Net., vol. 1, pp. 303, 1988, Supplement 1
[KOHO_95] T. Kohonen, “Self-Organizing Maps. Springer Series in Information Sciences, vol. 30”, Berlin, 1995.
[KUAN_05] C.Y. Kuan, and G. Healey, “Band selection for recognition using moment invariants”, Algorithms and
Technologies for Multispectral, Hyperspectral and Ultraspectral Imagery XI, SPIE, 2005.
[KULL_87] S. Kullback (1987) The Kullback-Leibler distance, The American Statistician 41:340-341.
[KUTI_05] M. Kutila, J. Viitanen, and A. Vattulainen, “Scrap metal sorting with colour vision and inductive sensor
array”, Computational Intelligence for Modelling, Control and Automation, pp. 725-729, Vienna, Austria, 2005.
[KWON_99] H. Kwon, S.Z. Der, N.M. Nasrabadi, and H. Moon, “Use of hyperspectral imagery for material
classification in outdoor scenes”, SPIE Proceedings Series, Algorithms, Devices, and Systems for Optical Information
Processing III, vol. 3804, Denver, USA, pp. 104-115, 1999.
[LEE_05] C.P. Lee, W. Snyder, C. Wang, “Supervised Multispectral Image Segmentation using Active Contours”,
Proceedings of the 2005 IEEE International Conference on robotics and Automation, Barcelona, 2005.
[LEVR_96] L. Devroye et Al., “A probabilistic Theory of Pattern Recognition”. New York: Springer 1996.
[LOWE_91] D. Lowe and A.R. Webb, “Optimized Feature Extraction and the Bayes Decision in Feed-Forward
Classifier Networks,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, no. 4, pp. 355-264, Apr. 1991.
[MAGN_99] Magnus, J.R. and H. Neudecker, “Matrix differential Calculus with Applications in Statistics and
Econometrics”. Wiley 1999.
[MALO_86] “Maloney L.T. and Wandell B.A.”, “A Computational Model of Color Constancy”, journal of the Optical
Society of America”, Vol. 3 N1, 1986.
[MANO_04] D. Manolakis, and D. Marden, “Dimensionality reduction of hyperspectral imaging data using local
principal component transforms”, Algorithms and Technologies for Multispectral, Hyperspectral and Ultraspectral
Imagery X, SPIE, 2004.
[MATA_94] J. Matas, R. Marik, and J. Kittler. Illumination invariant colour recognition. In E. Hancock, editor, British
Machine Vision Conference. BMVA Press, 1994.
[MCLA_00] McLachlan G.J. and D. Peel “Finite Mixture Models”. Wiley, 2000.
[MCLA_88] McLachlan G.J. and K.E. Basford (1988) “Mixture Models: Inference and Applications to Closterins.
Marcel Dekker” 1988.
References
Page 207
[MCLA_97] McLachlan G.J. and T. Krishnan, “The EM algorithm and its extensions” Wiley, 1997.
[MERC_02] G. Mercier, and M. Lennon, “On the characterization of hyperspectral texture”, IEEE International
Geoscience and Remote Sensing Symposium (IGARSS '02), vol. 5, pp. 2584-2586, 2002.
[MINSK_69] Minsky M L and Papert S A Perceptrons (Cambridge, MA: MIT Press), 1969.
[MONT_05] "Montoliu, R. and Pla, F. and Klaren, A.C.", "Illumination Intensity, Object Geometry and Highlights
Invariance in Multispectral Imaging", IbPRIA05, pages I:36, 2005.
[MORR_76] Morrison D.F.“Multivariate Statistical Methods” 2nd ed. McGraw-Hill, New York, 1976.
[NARE_77] Narendra, P.M., Fukunaga, K. “A branch and Bound Algorithm for Feature Subset Selection”, IEEE
Transactions on Computers (26-9)_ 917-922, 1977.
[NOCE_99] Nocedal J. and S.J. Wright. “Numerical Optimization”, Springer, 1999.
[OEHL_95] K.L. Oehler and R.M. Gray, “Combining Image Compression and Classification Using Vector
Quantization, ”IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, no. 5, pp. 461-473”, 1995.
[PAI_07] Pai-Hui Hsu, Feature extraction of hyperspectral images using wavelet and matching pursuit, ISPRS Journal
of Photogrammetry and Remote SensingVolume 62, Issue 2, , Including Special Section: - Young Author Award, June
2007, Pages 78-92.
[PAL_01] A. Pal and S.K. Pal. “Pattern Recognition: Evolution of Methodologies and Data Mining”, “Pattern
Recognition. From Classical to Modern Approaches” , World Scientific, 2001
[PARZ_62] Parzen E. “On estimation of a probability density function and mode”, Ann. Math. Stat. 33, pp. 1065-1076.
1962
[PEAR_1901] “On lines and planes of closest fit to systems of points in space”. Philosophical Magazine(2) 559-
572,1901.
[PERK_05] S. Perkins, K. Edlund, D. Esch-Mosher, D. Eads, N. Harvey, and S. Brumby, “Genie Pro: Robust image
classification using shape, texture and spectral information”, Algorithms and Technologies for Multispectral,
Hyperspectral and Ultraspectral Imagery XI, SPIE, 2005.
[PLAZ_05] A Plaza t Al, “Dimensionality reduction and classificationj of hyperspectral image data using sequences of
extended morphological transformations”, IEEE Geosci. Remote Sensing vol 43 nº 3, pp 466479,2005.
[PUDI_94] P. Pudil, J. Novovicova, and J. Kittler, “Floating Search Methods in Feature Selection” Pattern Recognition
Letters, vol. 15, no. 11, pp. 1,119-1,125, 1994.
[RAJP_03] K.M. Rajpoot, and N.M. Rajpoot, “Wavelet based segmentation of hyperspectral colon tissue imagery”, 7th
International Multi Topic Conference (INMIC 2003), pp. 38-43, Islamabad, Pakistan, 2003.
[RAMA_05] B. Ramakrishna, J. Wang, C. Chang, A. Plaza, H. Ren, C.C. Chang, J.L. Jensen, and J.O. Jensen,
“Spectral/spatial hyperspectral image compression in conjunction with virtual dimensionality”, Algorithms and
Technologies for Multispectral, Hyperspectral and Ultraspectral Imagery XI, SPIE, 2005.
[RAUD_98] S. Raudys, “Evolution and Generalization of a Single Neuron; Single-layerPerceptron as Seven Statistical
Classifiers”, Neural Networks, vol 11, nº 2 pp- 283-296, 1998.
[RELL_02] G. Rellier, X. Descombes, J. Zerubia, and F. Falzon, “A Gauss-Markov Model for hyperspectral texture
analysis of urban areas”, 16th International Conference on Pattern Recognition (ICPR’02), vol. 1, pp. 692-695 2002.
[RICH _01] Austin Richards Alien Vision: exploring the electromagnetic spectrum with imaging technology SPIE, The
International Society for Optical Engineers, 2001.
[RIPL_96] B. Ripley, Pattern Recognition and Neural Networks. Cambridge, Mass.: Cambridge Univ. Press, 1996.
[ROHD_97] Robert A. Rohde. Image originally created for Global Warming Art. GNU Licence.
[ROJA_96] R. Rojas, “Neural Networks”, Spinger-Velag, Berlin, 1996.
References
Page 208
[ROSE_58] Rosenblatt, Frank, The Perceptron: A Probabilistic Model for Information Storage and Organization in the
Brain, Cornell Aeronautical Laboratory, Psychological Review, v65, No. 6, pp. 386-408. 1958.
[RUME_86] Rumelhart et Al. “Learning representations by back-propagating errors” Nature, 323, 533-536.
[SANG_98] Stephen J. Sangwine, Robin E. N. Horne, The Colour Image Processing Handbook, ISBN 0412806207
Springer 1998.
[SCHA_90] R.E. Schapire, “The Strength of Weak Learnability,” Machine Learning, vol. 5, pp. 197-227, 1990.
[SCHO_97] B. Scho È lkopf, “Support Vector Learning,” Ph.D. thesis, Technische Universita È t, Berlin, 1997.
[SCHO_98] B. Scho È lkopf, A. Smola, and K.R. Muller, “Nonlinear Component Analysis as a Kernel Eigenvalue
Problem”, Neural Computation, vol. 10, no. 5, pp. 1,299-1,319, 1998.
[SHAF_84] “Shafer, S. A”., “Using color to separate reflection components”. In J. Opt. Soc. Am. A, vol. 1, page 1248.
1984.
[SHAF_85] S. Shafer, “Using color to separate reflection components”, Color Research and Applications, vol. 10, pp.
210-218, 1985
[SHAK_05] Shakhnarovish, Darrell and Indyk, “Nearest-Neighbor Methods in Learning and Vision”, The MIT Press,
ISBN 0-262-19547-X, 2005.
[SLAT_99] D. Slater, and G. Healey, “Material classification for 3D objects in aerial hyperspectral images”, IEEE
Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’99), vol. 2, pp. 2262-2267, 1999.
[SNYM_05] Jan A. Snyman.“Practical Mathematical Optimization: An Introduction to Basic Optimization Theory and
Classical and New Gradient-Based AlgorithmsW. Springer Publishing. 2005.
[SOME_04] Self-Organizing Maps and Learning Vector Quantization for Feature Sequences, Somervuo and Kohonen.
2004
[SOMM_06] E.J. Sommer, C.E. Ross, and D.B. Spencer, Method and apparatus for sorting materials according to
relative composition, US Patent 7,099,433, 2006.
[SORM_06] SORMEN - Innovative Separation Method for Non Ferrous Metal Waste from Electric and Electronic
Equipment (WEEE) based on Multi- and Hyperspectral Identification project, Sixth Framework Programme Horizontal
Research Activities Involving SMES Co-Operative Research, 2006, http://www.sormen.org/
[SPEC_08] Specim Spectral Imaging Ltd. http://www.specim.fi/.
[SPEN_05] D.B. Spencer, “The high-speed identification and sorting of nonferrous scrap”, JOM Journal of the
Minerals, Metals and Materials Society, vol. 57, no. 4, pp. 46-51, 2005.
[STOC_99] “Harro Stockman and Theo Gevers”, “Detection and Classification of Hyper-Spectral Edges”, Proc. 10th
British Machine Vision Conf., pp. 643-651, 1999.
[TAN_04] “Robby T. Tan, Ko Nishino, Katsushi i”, “Separating Reflection Components Based on Chromaticity and
Noise Analysis”, IEEE Trans. Pattern Anal. Mach. Intell, Vol 26, Number 10, 2004.
[TATZ_05] P. Tatzer, M. Wolf, and T. Panner, “Industrial application for inline material sorting using hyperspectral
imaging in the NIR range”, Real-Time Imaging, vol. 11, no. 2, Spectral Imaging II, pp. 99-107, 2005
[TOMI_94] Shoji Tominaga, Dichromatic reflection models for a variety of materials, Color Research & Application
19,5 277-285, 1994.
[TRES_95] V. Tresp and M. Taniguchi, “Combining Estimators Using Non-Constant Weighting Functions,” Advances
in Neural Information Processing Systems, MIT Press, 1995.
[TSO_04] B. Tso, and R.C. Olsen, “Scene Classification Using Combined Spectral”, Textural and Contextual
Information, SPIE, ATMHUI X, 2004.
[VAPN_06] Vladimir Vapnik, S.Kotz "Estimation of Dependences Based on Empirical Data" Springer, 2006.
References
Page 209
[VAPN_98] V.N. Vapnik, Statistical Learning Theory. New York: John Wiley & Sons, 1998.
[WAHA_06] D.A. Wahab, A. Hussain, E. Scavino, M. Mustafa, and H. Basri, “Development of a prototype automated
sorting system for plastic recycling”, American Journal of Applied Sciences, vol. 3, no. 7, pp. 1924-1928, 2006.
[WANG_06] J. Wang, and C.I. Chang, “Independent component analysis-based dimensionality reduction with
applications in hyperspectral image analysis”, IEEE Transactions on Geoscience and Remote Sensing, vol. 44, no. 6,
pp. 1586-1600, 2006.
[WILL_04] C. Willis, “Hyperspectral image classification with limited training data samples using feature subspaces”,
Algorithms and Technologies for Multispectral, Hyperspectral and Ultraspectral Imagery X, SPIE, 2004.
[WILS_04] David B. Wilson, Red-Green-Blue model, Phys Rev. E Vol: 69.3, 2004.
[WIND_07] “Self-modeling Image Analyis with SIMPLISMA”, Willem Windig et Al. “Techniques and Applications
of Hyperspectral Image Analysis”, John Wiley & Sons, 2007.
[XIE_93] Q.B. Xie, C.A. Laszlo, and R.K. Ward, “Vector Quantization Technique for Nonparametric Classifier
Design”, “IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no. 12, pp. 1,326-1,330”, 1993.
[YOSH_00] Ohno, Yoshi (Oct. 16-20 2000). "CIE Fundamentals for Color Measurements" in IS&T NIP16
Conference. Vancouver, Canada: 540-545.
[YUHA_92] Yuhas, R.H., Goetz, A. F. H., and Boardman, J. W., 1992, Discrimination between semi-arid landscape
endmembers using the spectral angle mapper (SAM) algorithm. In Summaries of the Third Annual JPL Airborne
Geoscience Workshop, JPL Publication 92-14, vol. 1, pp. 147-149.
[ZADE_65] L.A. Zadeh, Fuzzy sets, Inform. and Control 8 338-353. (1965).