Classification of materials through the integration of ... · barreiatuak eraikiz, gizakien ikusizko sistemaren konoen funtzionamenduan oinarrituta. Multzo hauek espektroaren auzokide

DOCTORAL THESIS

Classification of materials through the integration of spectral and spatial features

from hyperspectral data.

Submitted by:

Mr. Artzai Picón Ruiz

In fulfillment of the degree of Doctor granted by the University of the Basque Country.

Directed by:

Dr. Pedro Mª Iriondo Bengoa.

Bilbao, 2008.

Page i

DOCTORAL THESIS

Classification of materials through the integration of spectral and spatial features

from hyperspectral data.

Submitted by:

Mr. Artzai Picón Ruiz

In fulfillment of the degree of Doctor granted by the University of the Basque Country.

Directed by:

Dr. Pedro Mª Iriondo Bengoa.

Bilbao, 2008

Page ii

Acknowledgments:

The compilation and drafting of a Doctoral Thesis is a personal journey that is begun with great joy and uncertainty, not exempt of difficulties and which can not be undertaken alone. Though not each and every one of those people who have actively or passively contributed to the development of this Thesis can be included in this section, I would certainly like to express my gratitude to those who have been key in its undertaking. First, I would like to thank the Director of this Thesis, Doctor Pedro Mª Iriondo Bengoa, not only for all his support and help in its drafting, but also for his friendship and for having been the driving force, together with my school Physics Professor, Bro. Mariano Gutiérrez and my father, who instilled my preference for science and research. Second, I would like to express my gratitude to the members of CIPA (Centre for Image Processing and Analysis) of the Dublin City University, and specially to its Chair, Professor Paul F. Whelan who allowed me to do a one year research stay in the centre and to Doctor Ovidiu Ghita for all his invaluable help. I would also like to thank the considerable support and commitment by Tecnalia Corporación Tecnológica and specifically, to the director of the Tecnalia-Infotech unit, Mrs. Ana Ayerbe Fernández-Cuesta and its technical director Mrs. Silvia Rentería for all their support, both logistic as well as personal, provided in the undertaking of this Thesis. At the same time, I wish to express my gratitude to the members of the consortium of the European project SORMEN, in which I also contribute as a researcher, for granting me the right to use these hyperspectral images used for the experimental validation of these concepts and methodologies presented in this Thesis. I also want to thank the ETORTEK programme of The Basque Government for the economic funds granted for the completion of the research stay in CIPA and for the drafting of this Thesis. Additionally, I would like to thank my departmental colleagues (Tecnalia-Infotech), as well as my colleagues in Dublin, for their backing and support provided during the drafting of this Thesis. Last, I would like to thank my family and friends, and specially my parents and brother for all their support throughout my life and for having made me who I am, and to Sonia, for her considerable support and understanding in the last stages of the undertaking of this Thesis and because beside her, problems seem less.

Page iii

Abstract -Images taken in grey levels and those based on the standard representation of colour

RGB extract diverse properties of different objects or materials in them, thus making their visual

separation possible through the use of image processing techniques. However, at times, these

materials bear certain similarities in their appearance, shape and/or colour that make their visual

classification unfeasible. Counterpoint to this, hyperspectral images provide broad information

about the luminous spectrum reflected in each of the elements of the image. This characterises

their molecular properties in order to define more elaborate models that will provide greater

precision in the classification. Despite these advantages, small variations in the chemical

composition and/or the high variability between materials belonging to the same class, at times,

make the obtaining of a robust classification through the use of spectral features in a simplistic

manner impossible.

In order to provide a solution to this problem, this work sets out a methodology which allows, in

the first place, the optimal reduction of high spectral dimensionality through the construction of

spectral fuzzy sets which are bioinspired in the functioning of the cones of the human visual

system. These fuzzy sets minimise the redundant information that exists between adjacent bands

of the spectrum, thus maximising its discriminating power in a similar manner to that which

would be done by a "multispectral eye". Additionally, the spectral and spatial features of the

elements of the image are integrated, which make possible the obtaining of a combined descriptor

which in a more precise manner characterises the properties of the elements contained in that

image. The theoretical model for the classification has been validated through the use of samples

from materials for recycling from waste electrical and electronic equipment (WEEE). The

obtained results show an increase in the classification rate from 44%, by only using colour

information, to 56% via the use of spectral information through classic methods, up to 98%,

through the extraction and integration of the proposed spectral-spatial features.

Key Words - Hyperspectral image processing, image segmentation, image classification,

integration of spectral-spatial data, image processing, classification of materials, recycling,

bioinspired systems.

Page iv

Resumen— Las imágenes tomadas en niveles de gris y las basadas en la representación de color

estándar RGB permiten extraer diversas propiedades de los distintos objetos o materiales

presentes en las mismas, haciendo posible su separación visual utilizando técnicas de tratamiento

de imagen. Sin embargo, en ocasiones estos materiales presentan ciertas similitudes en su

apariencia, forma y/o color que hacen inviable la clasificación visual de los mismos. En

contraposición con ello, las imágenes hiperespectrales ofrecen información extendida acerca del

espectro lumínico reflejado en cada uno de los elementos de la imagen, lo que permite

caracterizar las propiedades moleculares de los mismos para así poder definir modelos más

elaborados que posibilitan una mayor precisión en la clasificación. A pesar de estas ventajas, las

pequeñas variaciones en la composición química y/o la alta variabilidad entre los materiales

pertenecientes a una misma clase hacen que, en ocasiones, sea imposible la utilización de las

características espectrales de manera simplista para obtener una clasificación robusta.

Con el fin de solventar dicha problemática, este trabajo presenta una metodología que permite, en

primer lugar, la reducción óptima de la alta dimensionalidad espectral mediante la construcción

de conjuntos difusos espectrales bioinspirados en el funcionamiento de los conos del sistema

visual humano. Estos conjuntos difusos minimizan la información redundante existente entre las

bandas adyacentes del espectro, maximizando el poder discriminativo del mismo, de forma

similar a como lo haría un “ojo multiespectral”. De forma adicional, se integran las características

espectrales y espaciales de los elementos de la imagen, lo que permite obtener un descriptor

combinado que caracteriza de manera más precisa las propiedades de los elementos contenidos en

la imagen. El modelo teórico para la clasificación se ha validado mediante la utilización de

muestras de materiales procedentes de residuos de equipamiento eléctrico y electrónico (WEEE)

para reciclaje. Los resultados obtenidos revelan un incremento en la tasa de clasificación desde

un 44%, utilizando solamente información de color, y un 56%, mediante la utilización de la

información espectral por métodos clásicos, hasta un 98%, a través de la extracción e integración

de las características espectro-espaciales propuesta.

Palabras clave —Procesamiento de imagen hiperespectral, segmentación de imagen,

clasificación de imágenes, integración de datos espectro-espaciales, procesamiento de imagen,

clasificación de materiales, reciclaje, sistemas bioinspirados.

Page v

Laburpena—Gris-mailetan zein RGB kolore-estandarreko irudikapenean hartutako irudietatik

irudi-tratamenduko teknikak erabilita posiblea da agertzen diren objektu edo materialen banaketa

(segmentazioa prozesua) burutzea eta horietatik zenbait ezaugarri atera ahal izatea. Hala ere,

kasu batzuetan material horietan itxuran, eran edo/eta kolorean ematen diren antzekotasunak

ezinezkoa egiten dute bere ikus-sailkapena. Aldiz, irudi hiperespektralak eskaintzen dute

informazio zehatzagoa, irudi hiperespektraleko osagai bakoitzean argi-espektroari buruzkoa

informazioa irudikatzen da ezaugarri molekularraren karakterizazioa ahalbideratuz. Informazio

honetan oinarritutako eredu landuak erabiliz sailkapen zehatzagoak lor daitezke. Ala ere, kasu

batzuetan, konposizio kimikoan ematen diren aldakuntza txikiak eta/edo klase bereko materialen

arteko aldakortasun handiak ezinezkoa egiten dute sailkapen sendo bat lortzea ezaugarri

espektralak era sinple batean erabiliz.

Arazo hori konpontzeko, lan honek metodologia bat aurkezten du. Alde batetik,

dimentsionaltasun espektral handiaren murrizketa hoberena lortzen da espektro-multzo

barreiatuak eraikiz, gizakien ikusizko sistemaren konoen funtzionamenduan oinarrituta. Multzo

hauek espektroaren auzokide banden arteko informazio erredundantea edo behargabekoa

minimizatzen dute eta espektroaren ahalera diskriminatzailea maximizatzen dute, “espektro-

anitzeko begi bat” balitz bezala. Bestalde batetik, espektro eta espazioko ezaugarriak integratzen

dira deskribatzaile konbinatu bat lortuz, zeinek irudian agertzen diren elementuen propietateak

era zehatzago batean definitzen ditu. Sailkapenerako asmatu den eredu teorikoa egiaztatu da

birziklapenerako hornikuntza elektriko-elektronikoaren (WEEE) hondakinetatik eratorritako

material-laginak sailkatuz. Lortutako emaitzak gehikuntza nabarmen bat erakusten dute

sailkapen-tasan, %44tik, kolore-informazioa soilik erabiliz, eta %56tik, metodo klasikoen bidezko

informazio espektrala erabiliz, %98raino, proposatutako espektro-espazial ereduan oinarrituz.

Gako-hitzak − Irudi hiperespektralen prozesamendua, irudien segmentazioa, irudien sailkapena,

espektro-espazial datuen integrazioa, irudien prozesamendua, materialen sailkapena, birziklatzea,

bio-inspiratutako sistemak.

Page vi

Index

Chapter I: Introduction.................................................................................................... 1

1. Aim of the Thesis.................................................................................................... 4

2. Content of the Thesis .............................................................................................. 6

Chapter II: Hyperspectral images for the classification of materials .......................... 9

1. Colour Theory....................................................................................................... 12

2. Acquisition and representation of hyperspectral images ...................................... 22

3. Issues on the application of hyperspectral images for the classification of

materials ........................................................................................................................ 26

4. Conclusion ............................................................................................................ 28

Chapter III: Classification methods.............................................................................. 31

1. Classical metrics ................................................................................................... 33

2. Classifiers.............................................................................................................. 39

2.1. Similarity classifiers.......................................................................................... 42

2.1.1. Nearest Neighbour 1-NN. ............................................................................. 43

2.1.2. Nearest mean................................................................................................. 44

2.1.3. Vector Quantization VQ. .............................................................................. 45

2.2. Statistical classifiers.......................................................................................... 51

2.2.1. Probability Theory ........................................................................................ 51

2.2.2. Parametric methods....................................................................................... 55

2.2.2.1. Gaussian distribution .................................................................................... 55

2.2.2.2. Estimate of the parameters for a Gaussian distribution from N observations.. 57

2.2.2.3. Application of Bayes' classifier to Gaussian distributions............................ 59

2.2.2.4. Gaussian mixture .......................................................................................... 61

2.2.2.4.1. Estimate of parameters of a distribution of Gaussian mixture from N observations 64

2.2.3. Non-parametric methods............................................................................... 66

2.2.3.1. Partition models on histograms..................................................................... 67

2.2.3.2. K-nearest neighbour...................................................................................... 68

2.3. Classifiers based on the calculation of boundaries of decision......................... 70

2.3.1. Perceptron ..................................................................................................... 71

2.3.2. Multilayer perceptron.................................................................................... 74

Page vii

2.4. Combination of classifiers ................................................................................ 78

3. Conclusions............................................................................................................... 79

Chapter IV: Feature extraction in hyperspectral vectors........................................... 81

1. Feature extraction...................................................................................................... 85

2. Feature selection ....................................................................................................... 95

2.1. Automatic feature selection. ............................................................................. 96

2.2. Selection and extraction of known discriminat features. .................................. 98

3. Conclusions............................................................................................................. 100

Chapter V: Extraction of spectral features based on fuzzy sets bioinspired in the

human visual system ..................................................................................................... 101

1. Definition of fuzzy sets........................................................................................... 103

2. Spectral fuzzy sets................................................................................................... 105

3. Multi-frequency spectral fuzzy sets ........................................................................ 113

4. Conclusions............................................................................................................. 115

Chapter VI: Integration of spectral and spatial features.......................................... 119

1. Fuzzy spatial histograms......................................................................................... 123

1.1. Improvement in the quantization of the histogram. ........................................ 124

1.2. Definition of the fuzzy neighbourhood histogram.......................................... 128

1.3. Definition of the fuzzy region histogram........................................................ 130

2. Extension of the fuzzy spatial histograms to vectorial features. Spectral-spatial

histograms ....................................................................................................................... 132

2.1. Quantization of the feature vector................................................................... 134

2.2. Definition of fuzzy vectorial histograms. ....................................................... 135

3. Conclusions............................................................................................................. 138

Chapter VII: Classification of spectral images and region analysis ........................ 141

1. General description of the process of classification ............................................... 142

2. Classification of hyperspectral images ................................................................... 145

2.1. Acquisition of an image and the correction of lighting. ................................. 146

2.2. Independence from the lighting source........................................................... 147

2.2.1. Independence from the geometric coefficient............................................. 150

2.3. Decorrelation of the luminous spectrum......................................................... 153

Page viii

2.3.1. RGB. ........................................................................................................... 154

2.3.2. RAW. .......................................................................................................... 155

2.3.3. Principal Component Analysis (PCA). ....................................................... 156

2.3.4. Fisher's linear discriminant. ........................................................................ 157

2.3.5. Spectral fuzzy sets....................................................................................... 157

2.4. Integration of spectral-spatial features............................................................ 158

2.5. Classification Procedure. ................................................................................ 159

3. Analysis and merging of regions. ........................................................................... 161

3.1. Region of maximum likelihood. ..................................................................... 163

3.2. Normalised region histogram.......................................................................... 164

4. Conclusions............................................................................................................. 166

Chapter VIII: Results ................................................................................................... 169

1. Description of the data sample................................................................................ 170

2. Background identification....................................................................................... 174

3. Influence of the correction of lighting for the classification of materials............... 178

4. Decorrelation of the luminous spectrum................................................................. 179

5. Integration of spectral and spatial features ............................................................. 183

6. Methods of region merging..................................................................................... 187

7. Conclusions............................................................................................................. 190

Chapter IX: Conclusions, contributions and future work ........................................ 193

1. Conclusions............................................................................................................. 194

2. Contributions........................................................................................................... 197

3. Future work............................................................................................................. 201

References ...................................................................................................................... 203

Page 1

Chapter I

Introduction

Chapter I Introduction

Page 2

Nowadays, sustainable development has become one of the most important goals of modern

societies. In present day society, a great quantity of materials in their most diverse forms and

varieties are fabricated, used and discarded, thus generating great quantities of waste. Numerous

times, these waste are not biodegradable or, even, can have high levels of toxicity. Despite

institutional efforts, many of these wastes can not be efficiently sorted, and as a consequence, are

deposited in traditional landfills without undergoing any type of recycling. Therefore, the

scientific community has the task of providing solutions and alternatives that allow its correct

sorting.

Within the great variety of existing waste, it is worth mentioning the importance of waste from

electrical and electronic equipment (WEEE). These wastes, which come from very varied

products, are made up from a great variety of materials which have great potential for recycling.

This fact has made the process for the recovery of these wastes to be one of the most complex

industrial tasks.

In these last years, due to environmental legislation that has been passed in relation with the

recycling process of WEEE, finding new solutions for the recovery of these materials has become

an obligation. To set an example, the Directive of the European Commission regarding the

recycling of waste from electrical and electronic equipment (WEEE) [CE_02] establishes that

Member States shall recover between 70-80% of the weight of the waste produced as well as shall

re-use between 50-70% of the recovered materials and components. This regulation enhances the

need to devote greater efforts to the development of new techniques and technologies capable of

improving the performance of applied methods for the sorting of waste.

In order to achieve the proposed objectives, the development of automated systems for the

recycling of electrical and electronic waste becomes an economic and efficient alternative in the

recycling of these wastes.

Specifically, the current process of recycling of WEEE exposes this electronic waste to processes

of crushing and both mechanical as well as densimetric sorting. However, the fractions which are

a result of this separation still contain a mix of non-ferrous materials (for example, aluminium,

copper, zinc, brass or lead) and austenitic stainless steel, which represent 13% of the total scrap

from WEEE. Along these lines, it is important to highlight that this mix can not be sorted using


Page 3

current recycling methods [SORM_06] [BERE_07]. Nonetheless, the toxicity of some of these

materials makes the finding of solutions for its correct sorting and recycling even more critical.

Figure. 1. Set of non-ferrous materials from electronic waste.

The methods traditionally employed for sorting these materials involve the use of visual

inspection by highly-skilled operators [SPEN_05]. Based on these methods, Kutila et al

[KUTI_05] developed a colour system of inspection which was applied in the sorting of metals

whose colours were predominantly reddish from other more shiny metals, such as aluminium and

zinc. The results obtained indicated that those materials that were defined by reddish properties

could be separated from shiny materials. However, inadequate results were obtained when trying

to sort materials of the same colour group.

On the other hand, methods based on X-rays have been widely used for the sorting of metallic

scrap and the separation of plastic [SOMM_06], [WAHA_06]. However, they are not suited for

the sorting of materials with similar properties due to the fact that they only measure the density

of the material [SPEN_05].

In fact, the only methods which have obtained a greater efficiency in the sorting of metals are

those based on the detection and analysis of their emitted spectrum when subject to thermal

excitement. However, these approaches can not be used for recycling processes due to both the

slow rate of the process of acquisition of the spectrum and the technical difficulty entailed in

exciting each and every one of the fractions of scrap to be classified and sorted.

On its part, modern optical spectrometers provide a high resolution image with detailed spectral

information. This technology implies the acquisition and interpretation of multidimensional


Page 4

digital images. Current systems are capable of acquiring multiple bands, from ultraviolet to the

very long infrared, with a good resolution between bands. This versatility allows the application

of spectral image systems for the detection and sorting of several classes of materials, both

natural as well as man-made, such as: minerals, metals, plastic, vegetation, cell differentiation...

[WABA_06, CHANG_03, TSO_04, SPEC_08].

The main characteristic of hyperspectral images is that each pixel is defined by a vector whose

elements correspond to different spectral components (wavelengths) acquired from the scene. In

this manner, the hyperspectral vector not only provides colour information associated with the

scene, but also information related to the molecular behaviour of the materials present in that

scene [CHANG_03, GRAH_07, SLAT_99, HEAL_99], thus easing their sorting.

However, the high variability of these materials even makes the use of spectral information

insufficient for their robust identification and classification. This makes necessary the use of new

methods that will correctly characterise these materials for their correct recycling.

1. Aim of the Thesis

In order to resolve the earlier problem, one of the main aims of this Thesis consists in establishing

a complete framework for the classification of materials which bear great similarity and

variability. These properties cause traditional methods based in colour to not be capable of

correctly classifying these materials and, even, causes that the use of spectral information in order

to undertake this task does not obtain the desired results.

Nonetheless, the approach to resolve this problem is not solely focused on the analysis of the

spectral properties of materials and their optimal classification, but also in the development of a

generic theoretical framework that integrates spectral and spatial features contained in

hyperspectral images in a sole mathematical descriptor.

Through this generic integration in a sole descriptor (feature vector) of the spectral and spatial

properties of each element of an image, the aim is to improve the efficiency of the

characterisation, segmentation and classification of these images versus the isolated use of these

properties.


Page 5

In order to achieve an optimal spectral-spatial integration, further study is going to be undertaken

in the different existing spectral decorrelation techniques, thus o providing an appropriate

solution to the problem of feature extraction.

Additionally, using the generic spectral-spatial descriptor as a starting point, the aim is to create

an adequate mathematical model that will detect and separate different materials taking into

account their inherent variability.

Additionally, the aim is to integrate earlier models in a complete framework that allows the

acquisition, processing, detection and classification of materials in a robust and computationally

efficient manner in order to obtain a modular algorithm that is compatible with the existing

restrictions in an industrial application (robustness, tuning cost, speed). In this manner, the

resulting algorithm shall be able to be integrated in a simple manner in industrial sorting

processes.

Parallel to the undertaking of this Thesis, the European Project SORMEN Innovative Separation

Method for Non-Ferrous Metal Waste from Electric and Electronic Equipment (WEEE) based on

Multi- and Hyper- spectral Identification [SORM_06] is taking place. This project is developing

an industrial system for the sorting of electronic waste (WEEE). Given the great difficulty that

exists for the classification of these elements, the validation of the methodologies proposed in this

Thesis are going to be made through the classification of sets of electronic waste provided by

recycling companies belonging to the consortium in this project.


Page 6

2. Content of the Thesis

This Thesis is divided into nine chapters which detail the different aspects dealt with in this work.

After the introductory chapter, Chapter II describes the physical fundamentals and the

methodology employed for obtaining hyperspectral images, highlighting the advantages of the

use of this type of images for the classification of materials as opposed to traditional colour

techniques. At the same time, this chapter sets out in detail the traditional classification procedure

based on hyperspectral images and lists some of the limitations of the current methods which

must be perfected.

Chapter III sets out the current state of the art of existing classification techniques. Firstly,

classical metrics used for the quantification of the existing differences between two specific

spectra are shown, and, next, a description is provided about the different classification

techniques that create robust classification models.

Chapter IV sets out the existing problems for obtaining an adequate classifier while taking into

account the high dimensionality associated with luminous spectra. At the same time, it describes

the Hughes Phenomenon that is caused by this high dimensionality. In order to reduce its effect,

an analysis is made of current techniques for feature reduction that select or extract those which

permit an adequate reduction of data dimensionality and which maintain their discriminating

power while studying their limitations.

On the other hand, Chapter V looks at the limitations of current methods of feature extraction. It

provides a list of desirable properties that the optimal feature vectors should comply with, which,

in turn, will build an adequate descriptor of the luminous spectrum. Based on these properties, a

novel method is proposed for the extraction of spectral features through the fuzzification of the

spectrum (division of the latter based on fuzzy sets) and the use of the concept of the Energy

associated to each of the fuzzy sets defined in it. This approach extracts visual information in a

similar manner to that done by the human visual system.


Page 7

In turn, Chapter VI theoretically proposes and describes a methodology that unifies spectral

properties (luminous spectrum) and spatial (distribution of neighbour features) in a sole feature

vector.

Chapter VII provides a complete algorithm that allows the optimum classification of materials

through the use of hyperspectral images while taking into account the proposed methodologies in

Chapter V and VI. At the same time, this chapter defines the statistical model that permits for the

characterisation of each of the elements to be classified. Additionally, methodologies that reduce

the dependence on the chromaticity of the material versus external factors of lighting and

geometry are integrated in this algorithm, proposing diverse options that reduce this dependence

under non-ideal conditions and that, in turn, include a process that is capable of unifying those

erroneously classified regions.

On the other hand, Chapter VIII provides detailed results on the performance of the different

proposed methodologies, thus comparing the methods proposed in this Thesis with those that

belong to the state of the art. The experimental verification is undertaken on a set of tests that is

composed by electrical-electronic waste that has been previously selected and validated by

enterprises which participate in the SORMEN project [SORM_06].

Finally, Chapter IX contains the conclusions of the finalised work, providing a summarised

listing of the main contributions and setting out the future work after the completion of this

Thesis.


Page 8

Page 9

Chapter II

Hyperspectral images for the classification of materials

Chapter II Hyperspectral images for the classification of materials

Page 10

Sight is one of the most developed senses in the human being. Both humans, as well as animals,

are capable of interpreting their own environment through their perceived stimuli, having

developed a great capacity for visual interpretation that allows them to freely cope in the

environment which surrounds them. Of all the senses, sight is considered the most important

sense of those that are available to the human being, allowing a more precise interpretation of

his/her surroundings, just as has been set out by great geniuses throughout time:

"In adequate circumstances and at the appropriate distance, the eye is tricked less than any other sense, as, I will later

show, it sees through straight lines that form a pyramid, whose vertex points on the eye and its base rests in the object

being watched. Hearing, on the other hand, is often tricked regarding the place and distance of the source of the sound,

as these do not reach through straight lines, as those of the eyes, but rather through broken and tortuous waves, this,

often times, happens some distant voices seem nearer that those nearest, due to their trajectory; luckily, only the echo

travels in a straight line. Even with greater difficulty does the smell find the source of a perfume. Taste and touch, on

their part, must touch and object in order to know it."

Leonardo Da Vinci Treatise on Painting. 1651.

However, one becomes accustomed to visualising the world in the manner in which one does, not

being fully aware of the inherent limitations of the human eye. For example, one can not see

objects below a certain scale without the use of microscopes or lenses. Additionally, one can not

process visual information at high speeds.

These are not the only limitations of the human eye, when a ray of sun light goes through a prism,

this is divided into a rainbow pattern that begins in violet and varies progressively towards red. In

the boundaries of this rainbow, one can see that light disappears and falls into darkness. This

darkness is, in fact, apparent, as there is a luminous emission in the boundaries of the spectrum

even though the human eye is not capable of detecting it. Although the human being is capable of

receiving visual stimulus in a range of frequencies between 400 and 700 nanometers, other

animals are capable of sight in the ultraviolet region of the spectrum, thus perceiving the

environment in a manner different to the human being.

This can be explained due to the properties of light as an electromagnetic wave. Figure 2.1 shows

that solar irradiance as a function of the obtained wavelength of light is at its maximum in the

region between 400-700 nm, region which corresponds to the wavelengths that are detected by

the human eye. This adaptation to the solar luminous frequency allows human beings and animals

to comfortably and efficiently cope in their own environment.


Page 11

Fig. 2.1 Spectrum of solar radiation.

However, the human eye is not capable of perceiving wavelengths that are not within this visible

band. Nonetheless, these non-visible wavelengths provide a great deal of information that would

allow us to have a more detailed vision of the environment and of the objects that surround us.

High frequencies, such as X-rays or gamma rays, provide information about the internal

structures of different objects, while infrared frequencies provide information on the molecular

interactions that exist in a specific material or about the temperature and/or heat flux of a specific

object. In this same manner, many materials that have a similar appearance in the visible range of

the spectrum can be distinguished in other ranges of the spectrum due to the fact that they have

luminous properties that are totally different in that region of the spectrum. Examples of this can

be found in the prestigious book Alien Vision [RICH_01], which compiles different perceptions

of our surroundings in different wavelengths.

This chapter, first theoretically sets out the process of image formation caused by the incidence of

a luminous ray on an object as well as the causes why a specific material has a characteristic

reflected luminous spectrum. In the same manner, an analysis is made on the way in which these

luminous spectra are perceived by the human visual system and how it is not capable of capturing

all the existing spectral information.

250 500 750 1000 1250 1500 1750 2000

Wavelength (nm)

Spectral Irradiance

visible


Page 12

Next, the different technologies that allow the capturing of these luminous spectra and their

associated digital representation (Hyperspectral image) are set out in order to establish the

luminous spectrum associated with each point of the image.

Last, a description is provided of the process that is followed in order to classify materials

through techniques based on these hyperspectral images. Additionally, we show the different

difficulties that must be faced in order to optimise the information contained in these images that

shall provide a more precise characterisation of the materials/objects that they represent.

1. Colour Theory

This section aims to provide a general idea on the process of image generation and its physical

properties. Understanding this process allows us to show the causes why the luminous spectrum

that is reflected by a material can be an indication of its molecular properties. In order to do this,

a brief review of colour theory is made that establishes the theoretical basis that explains the

particular chromatic perception of a specific material.

The colour with which one perceives a specific object depends on several factors. However, not

all of them are dependent on the features of the object. In fact, colour, or the perceived chromatic

perception, depends not only on its chromatic properties, but also of the nature, intensity and

position of the incident light, of the geometry of the object as well as the position and features of

the observing element.

The first factor which influences colour or the perceived spectrum is determined by the nature of

the luminous source. Leonardo Da Vinci, in his Treatise on Painting [VIN_1651], noticed this

behaviour by establishing that the different positions of the sun caused changes in the chromatic

perception of objects, making reference to the blue hues of the morning shadows which tone to

warm nuances in the evening. In a similar manner, the lighting of a scene through the use of

different types of artificial lighting has different hues depending on the type of lighting used. In

this manner, halogen or incandescent lighting produce warmer hues than those by lighting which

is either fluorescent or based on white LEDs, which tone objects with a bluish colour. Based on

these affirmations, the nature and features of the luminous source directly influence the chromatic

perception of the observed object.


Page 13

Another influential factor in the chromatic perception of an object is the geometric features of the

object. Its own geometry, as well as its reflective properties (specular object, diffuse, matte),

influence the chromatic perception of the object. However, these factors do not depend on its

molecular composition, but rather are due to the relative position between the incident luminous

focus and the geometry of the object. However, the aforementioned properties define the intensity

with which that colour is perceived.

In order to characterise an object based on its luminous properties, the most important property

corresponds with its chromaticity. Bearing in mind that it is not the material which possesses the

color, but rather, the molecular composition of the material is that which makes certain

wavelengths be reflected and absorbed in a different manner. The percentage of absorption or

reflection of each of the wavelengths of a material, known as chromaticity [TAB_04] or

reflectivity [OHNO_00] is solely going to depend on the molecular properties of the object, thus

allowing its identification based on its colour.

The third factor that influences the perception of colour is related with the sensory element which

receives that reflected lighting. In this sense, the capacities of the visual sensor of the observer

significantly influence the quality of the compiled information. In this manner, the human eye

uses a type of nervous ending known as cones that are capable of converting the reflected

luminous spectrum into colour information [SANG_98]. There are other types of nerve endings,

known as rods, which allow a colourless night vision and which do not allow to capture colour

information [SANG_98, GUNT_92].

This way, for the human being, the capacities of these photoreceptive cells are going to determine

which features of the reflected luminous spectrum are extracted and interpreted. Specifically, the

human being has three different types of photoreceptive cones. These types of cones, known as S,

L and M capture luminous information in different wavelengths, as can be seen in Fig. 2.2. By

having three types of sensors, the human eye is capable of perceiving colours made up by the

combination of each signal received by each of the types of colour receptors (cones) that are in it,

that is, by the combination of three basic colours taken as primary. Due to this, the different

colour systems that represent human vision: RGB, HSL, HSV, CIELab… [YOSH_00, SANG_98,

GUNT_82] are based on the combination of three basic components.


Page 14

Fig. 2.2 Absorption frequencies of the different types of cones in the human visual system.

However, the luminous information perceived by the human eye does not encompass all the

information contained in the spectrum reflected by the material. Rather, it only contains the

aggregate information from the spectral response caused by the absorption of different cones. In

the same manner that a person with a dysfunction in his class M cones shall have difficulties in

order to distinguish between red and green hues (daltonism), the human eye does not capture all

the information contained in the reflected luminous spectrum, thus losing some of the information

that could be of interest for the characterisation of the perceived objects.

In conclusion, one can assert that the information contained in the reflected spectrum by a

material is not only related to its molecular properties, but it is also related to the nature of the

incident lighting, the geometric properties of the material and the capacities of the receptor

sensor.

In order to establish whether the information contained in the reflected spectrum of a material

offers greater information than the perceived information by the human visual system, we are

going to mathematically describe the aforementioned model of colour formation and compare the

information obtained by the human visual system with that which is contained in the reflectance

spectrum.

In order to do so, it is necessary to begin by describing the element which causes the existence of

the reflectance spectrum: Light. The discussion on the corpuscular or undulatory nature of light

S Cone

M Cone L Cone

Wavelength (nm)

400 500 600 700


Page 15

comes from the XVIII century, in which Newton proposed the corpuscular theory of light based

on the rectilinear properties of its movement, reflection and behaviour when facing obstacles.

However, this theory could not explain either the absence of the loss of mass either when emitting

these corpuscles or the different behaviour of refraction and reflection of different corpuscles. At

that same period, Huygens proposed an undulatory theory of light based on the observation of

different phenomena. However, this theory could not explain the cause of the propagation of light

in a vacuum, which caused the theory of the existence of ether.

In 1855, Maxwell published his mathematical theory on electromagnetism which predicts the

existence of electromagnetic waves that propagate at the speed of light, the different

electromagnetic waves (light, radio, microwaves...) being considered of the same nature, but with

different frequency.

Einstein broadened the available knowledge on light by considering that it is composed of

particles known as photons. These photons, in theory, without any mass or electrical charge,

constitute indivisible packs of energy that are dependent on its frequency in accordance with the

equation:

ν·hE = (2.1)

Where h is Planck's constant andν the frequency associated with that photon.

Taking as a basis this model of light, which is made up of a set of photons associated to different

frequencies, one can define the electromagnetic spectrum of an object as the distribution of the

emitted, reflected or absorbed intensity of energy (depending on the type of spectrum) in a

selected range of wavelengths.

In this manner, an incident ray of light that emits energy in different frequencies is defined by its

incident emission spectrum ( iL ) that shows the intensity of this incident ray in each of the

associated wavelengths. Figure 2.3 shows a graphical representation of an emission spectrum.


Page 16

Fig. 2.3 Graphical representation of a luminous spectrum.

First, we begin by providing an explanation on the simplest classical models that exist in order to

explain the phenomenon of the reflection of light. The simplest model for the formation of colour

is that known as the specular model [COOK_81]. In the specular model of reflection, the

luminous spectrum is reflected in a sole direction defined by the angle of reflection. This

phenomenon is caused by the different electromagnetic properties of the different mediums in

which the luminous wave travels, thus causing a change in the direction of the interface between

both elements.

Fig. 2.4 Specular model of reflection

In this model, not all lighting has to be reflected; therefore part of it can be transmitted through

the material in accordance with equation 2.2.

)()·()( λλλ ispecularr LCL = (2.2)

λ

Intensity

λi

L

i

i

L(λi)

Θi Θr Li Lr


Page 17

Where )(λrL corresponds with the luminous intensity reflected for a given λ wavelength, and

)(λspecularC is the reflection coefficient for that given wavelength. This reflection coefficient is

usually considered not dependent on the wavelength, (neutral interface model [TOMI_94]), thus

having the same percentage of intensity irrespective of the wavelength. This way, the reflection

coefficient is converted into its scalar expression:

)(·)( λλ ispecularr LCL = (2.3)

This reflection model reliably represents the behaviour of polished elements with specular type

behaviour in which the light transmitted to the body of the object is directly reflected according to

the laws of reflection. In diffuse bodies, incident light returns to the initial element after

successive interactions in the inside of the object interacting with its colouring. In order to

represent this behaviour the Lambertian reflectance model is used [ANGE_99], to characterise

objects known as matte. This model assumes perfect diffusion and a perfectly homogeneous

material in which the reflected light does not depend of the point of view of the observer, but

rather, only of the angle θ composed by the incident ray of lighting and the normal to the object,

as seen in figure 2.5.

Figure 2.5 Lambertian reflectance model.

Equation 2.4 shows a mathematical description of the features of reflected light, being able to

notice that its intensity is maximum for perpendicular incidences on the surface of the object.

)()·()·cos()( λλθλ ibodyr LCL = (2.4)

θ


Page 18

In this model, the luminous focus enters the body of an object where it is subject to reflection and

refraction phenomena with its body, interacting with the colourings of the material before

returning to the surface. For this reason, the different wavelengths are re-emitted to the surface

with different intensity, this intensity depending on the reflection coefficient or chromaticity of

the material )(λbodyC .

None of the two earlier models allow for an acceptable representation of the real reflection that

occurs in the majority of real materials. For this reason, Shafer [SHAF_84] proposes a

dichromatic model of reflection in which he expands on previous models assuming that light

interacts with the existing interface between the element and the material, but also enters in its

body interacting with the colourings of the material before returning to the surface. This model,

therefore, proposes the combination of the two earlier models in order to describe, in a steadfast

manner, the model of light interaction.

Fig. 2.6 Dichromatic model of reflection

The amount of reflected light in the interface is governed by the Laws of Fresnel, which associate

the reflectance with the angle of incidence, the index of refraction of the material, and the

polarisation of the incident light that is produced in the specular reflection.

Incident Light

Specular Reflection

Reflection of a body

Interface

Body

Colouring

g

i

e s

Li


Page 19

The transmitted light passes through the material interacting with its colourings, producing a

probability of absorption that depends on the wavelength. The non-absorbed light is re-emitted

through the same interface creating the diffuse reflection of the object. The geometrical

distribution of this reflection is considered isotropic, non-polarised and usually of a different

colour to the incident light due to the effect of the colourings.

In this manner, the reflected light spectrum that is perceived by the observer is made up by the

sum of two types of reflection: mirror-like or specular and diffuse. The first is caused by the

different existing properties between the element and the material. The latter is caused by the

interaction of light with the body of the material (equation 2.5). Figure 2.6 shows that the

observed vector of lighting depends on several factors such as the incident lighting vector, the

angle of incidence, the geometry of the object, the position of the observer and the chemical

features of the material.

),,(),,,(),,,( int giLgeiLgeiL bodyerfacer λλλ += (2.5)

This equation is given by the incident light (Li), independent geometric factors (minterface,m body),

and factors which define the molecular properties of the objects (Cinterface,Cbody):

)()·()·,()()·()·,,(),,,( interfaceinterface λλλλλ ibodybodyir LCgimLCgeimgeiL += (2.6)

Using the neutral interface model, the interface coefficient Cinterface does not depend on the

wavelength and does not provide information on the chemical composition of the object, being

able to integrate it in the interface coefficient mcinterface.

)()·()·,()(·)·,,(),,,( interfaceinterface λλλλ ibodybodyir LCgimLCgeimgeiL += (2.7)

)()·()·,()()·,,,(),,,( interfaceinterface λλλλ ibodybodyir LCgimLCgeimgeiL += (2.8)

The reflection model in conductor materials is not correctly represented in the earlier model. In

these materials, when an incident spectrum interacts with the interface, its free electrons may

absorb part of the energy of the incident spectrum in different wavelengths. The non-reflected

part of the spectrum is absorbed and does not penetrate the body of the material in a dimension


Page 20

greater than 10-7m. This phenomenon explains why in conductor materials there is no reflection

due to the body (diffuse).

The excited electrons release previously acquired energy which is re-emitted as photons. The

greater part of this energy is re-emitted as light with the same wavelength and only a small part is

re-emitted as heat. The typical golden colour of some metals, such as copper and gold, is due to

the fact that part of the spectrum between the blue and ultraviolet ranges is not re-emitted

[MATA_94].

Fig. 2.7 Reflection model on conductor materials

Equation 2.9 shows the equation that defines the calculation of the reflected spectrum in

conductor materials:

)()·()·,,()( λλλ iconductorconductorr LCgeimL = (2.9)

Here, one can notice significant differences in the model of colour formation for dielectric and

conductor materials. The use of diffuse lighting eliminates the specular component, both of

incident as well as reflected rays, causing the same type of response in each of the analysed

materials. Taking this into account, specular reflection, in turn, becomes diffuse reflection,

independent of the point of view of the observer:

)()·()·,()( λλλ imaterialr LCyxmL = (2.10)

Absorption


Page 21

Where m(x,y) is a factor dependent of the geometry of the material that defines the quantity of

lighting that it reflects, )(λmaterialC which represents the chromaticity or reflectivity of the

material and which depends on its molecular properties and )(λiL the vector of incident lighting.

Bearing this model in mind, the features of the material defined in the vector of the chromaticity

of the material ( )(λmaterialC ) are intrinsically stored in the vector )(λrL . As shall be explained in

later chapters, there are methodologies to eliminate the influence of the incident lighting vector

and the m geometric factor, in such a manner that the chromaticity vector that is characteristic of

that material is extracted.

In other words, the reflected vector )(λrL estimates the chromaticity vector of a point of an

image and provides us with information on its molecular properties. Fig. 2.8 shows several

reflected spectra )(λrL observed through a hyperspectral camera:

Fig. 2.8 Representation of the reflectance spectrum )(λrL Going back to Figure 2.2, which illustrates the ranges of sensitivity of the different types of

existing cones in the human visual system, one can notice that the human being can only make

three readings of the reflected spectrum )(λrL . If, on top of this, one adds that it is necessary to

at least sacrifice a degree of liberty in order to eliminate the variability of the intensity of the

incident vector of lighting, the chromaticity vector that is observed by the human being, which

establishes the features of the observed object, is only composed of 2 or 3 components.

Wavelength (nm)

250 500 750 1000 1250

)(λrLVisible Spectrum


Page 22

In the RGB colour model [WILS_04, GONZ_08], which represents a perception model similar to

that of the human being, the observed reflected spectrum is limited to the intensity produced in

the colour frequencies associated with the wavelengths of red, green and blue, as defined by

equation 2.11:

],,[],,[ BGRLLLrBrGrRrRGB ==L (2.11)

As can be inferred, the RGB spectrum observed by the human being does not contain all the

discriminating information contained in the full spectrum of chromaticity. This phenomenon has

been verified in numerous works which have made a comparative study on the precision of

classification methodologies based on spectroscopy versus colour based techniques

[HOLL_03], showing the advantages of spectrometric techniques for the classification of

materials versus those simply based on colour.

2. Acquisition and representation of hyperspectral images

In the previous section, a definition was provided for the reflectance spectrum of a material that

depends on the specific molecular properties of the associated element. Classical spectrographers

made calculations of the average spectrum of all signals that were captured by the receptor, that

is, the result was a "sum" spectrum of all the present elements. However, current spectral image

sensors obtain an image, associating each corresponding spectrum to its pixel.

In this section a listing is provided on the methods used for the capture and representation of

hyperspectral images. In order to ease its understanding, the section begins with the description of

the process of acquisition and representation of the monochrome image, colour and finish with

the description of these methods for hyperspectral images.

Although the first attempts to process digital images were developed for the transfer of newspaper

photographs between London and New York in the 20's, the first computer-processed digital

images were obtained from the lunar surface by NASA in 1964 in order to choose an adequate

landing area for the Apollo vehicles. Since then, vision sensors have notably evolved.


Page 23

Currently, an image sensor consists of a matrix of perfectly aligned small cells. These cells are

composed of a photosensitive electronic element that produces a specific electrical voltage

depending on the quantity of light that it receives (CCD). Each of these cells is assigned a specific

(x,y) position. Following the same architecture, a digital image is defined by a matrix of rows and

columns that store a value related with a grey level at a given position, as can be seen in Fig. 2.9.

Fig. 2.9. Representation of a digital image in grey levels.

Digital colour images are obtained from conventional CCD sensors. Incident light is filtered or

diffracted in order to allow colour components R, G and B to reach the adequate sensor cell. One

of the methods that use this technique is known as the Bayer filter [BAYE_76], which consists of

a layer of R, G and B filters that cover the sensor (see Fig. 2.10). In this way, each sensor element

only receives one of the colour components, allowing the final interpolation in order to obtain the

final RGB pixel.

Fig. 2.10 Bayer filter over a CCD.

These colour images are represented by three two-dimensional matrices in which each of them

represents the sensor's response to one of the RGB colours. These three two-dimensional matrices

create a three-dimensional matrix in which the first two dimensions represent the position of the

point in the image, and the third dimension, each of the colour components, as can be seen in

figure 2.11

X

Y


Page 24

Fig. 2.11 Representation of the RGB digital image.

Obtaining spectral images entails a greater complexity than the capture of colour images, in

which only three wavelengths corresponding to the colours red, green and blue are captured.

In order to achieve the capture of spectral images there are two main alternatives. The first, based

on a sequential acquisition, which is based on an adjustable filter or in a rotary filter wheel placed

in front of a monochrome camera. This approach keeps the spatial resolution of the sensor, but

requires that the object be perfectly static during the exchange of the two filters in order to avoid

losing the spectral coherence between the different captures.

Fig. 2.12 RGB filter wheel.

The other method which permits the acquisition of hyperspectral images simultaneously acquires

all spectral bands. In order to do so, the variability of the angle of refraction is used in relation to

the wavelength. In this manner starting with the capture of a line of an image similar to a line

scan camera, the spectral information is extracted through a prism that refracts each of the

wavelengths in the image. In this manner, the obtained image contains, in abscissas, the position

X

Y


Page 25

of the captured line and, in ordinates, each of the spectral frequencies. In order to obtain the

complete image, many snapshots are combined.

Fig. 2.13 Principle of hyperspectral image acquisition.

Figure 2.13 shows this principle. First, a line of the image is acquired and the light of each of

those points in that line is spread vertically by the prism based on its wavelength. In this manner,

each line is captured in the CCD sensor as a two-dimensional image in which the horizontal axis

represents the position in the pixel in that line (X axis) and the λ axis represents the different

wavelengths spread by the prism.

Synchronising the capture by the camera with the Y movement produced between the camera and

the object, one obtains the different lines of the object creating the associated hyperspectral image

(Fig. 2.14).

Fig. 2.14 Hyperspectral cube

Unlike standard images that are observed by the human eye, hyperspectral images contain

complete spectral information on each spatial point of the image. This image is known as

λ

Prism Camera

x

Y

X

CCD Sensor

X

λ

Y


Page 26

Hyperspectral cube (Fig. 2.14) and consists of a three-dimensional matrix in which the first two

dimensions represent the spatial positions in the image and the third dimension of the matrix

represents each of the spectral bands. From another perspective, one can simply consider a

hyperspectral image as a vectorial extension of a monochrome image. This last approach applies

the same tools as in a monochrome image (grey), but from a vectorial perspective.

One of the features of hyperspectral images is that each pixel in the image is represented by a

vector whose components correspond with each of the captured wavelengths, thus providing not

only information on the associated colour to the scene, but also, as previously shown, information

on its molecular properties [CHAN_04] [GRAH_07].

In a similar manner, by selecting a specific wavelength, one can obtain a two-dimensional image

associated to that wavelength, being able to simultaneously obtain spectral and spatial

information of the image.

3. Issues on the application of hyperspectral images for the classification of materials

The studies and forecasts provided by European research networks on hyperspectral matters

(Hyperspectral Imaging Network) [GAMB_07] indicate that the improved spectral resolution and

the high spatial resolution of contemporary spectral sensors open up numerous application

possibilities for this type of images. Among which, the following should be highlighted:

environmental modelling, detection of biological threats, monitorisation of spills, detection of

camouflaged elements, estimate of the chemical composition, detection of pathogens or tumour

cells, etc. [GRAH_07].

On the other hand, at the 3rd International Workshop on Spectral Imaging, which took place in

Graz, Austria in 2006, its preface makes reference to spectroscopy based on images (Spectral

Imaging) as the science that combines the advantages on machine vision with the potential of

traditional optical spectroscopy.

This combination integrates the discriminating power inherent to the spectrum of a material with

the techniques of segmentation and classification from conventional machine vision based on the

knowledge of spatial distribution (position) of the spectrum. Therefore, one can significantly


Page 27

increase the information that can be obtained from different images through the integration of

available spectral and spatial information.

In a generic manner, the process for the classification of a specific element in a hyperspectral

image goes through the following phases, which are also listed in figure 2.15:

1. Image acquisition: Acquisition of a hyperspectral image through adequate capture

methods.

2. Selection of the spectrum to be analysed: Selection of a pixel from the acquired image

and selection of its associated luminous spectrum represented by a vector with a number

of components equal to the number of captured wavelengths.

3. Extraction of spectral features: The selected spectrum has a high number of components.

Furthermore, they are highly correlated. Due to the Hughes Phenomenon [HUGH_68], as

will be seen in Chapter IV, this makes the classification more difficult, making necessary

processes that reduce the dimensionality and extract relevant features that correctly

characterise the elements to be classified.

4. Classification: Use of the features extracted in the earlier section for the identification of

the selected spectrum based on mathematical classifiers.

5. Repetition of earlier steps for the remaining pixels of the image.

Fig. 2.15 Classification process of elements based on hyperspectral images

Extraction of features

Acquisition of the spectral image

Classification


Page 28

The use of this methodology associates each pixel of the acquired image with a specific class or

material. However, studies undertaken by the European network on spectroscopic image

(Hyperspectral Imaging Network) [GAMB_07] emphasize the existence of several weak points

within the chain of data processing which are necessary to insist upon today:

- Spectral Correlation: Stress is placed on the need to have centralised models of materials

that do not depend on acquisition or lighting conditions.

- Classifiers: It is necessary to choose classifiers that are simple, robust and with great

capacity to be generalised.

- Detection of features: It is necessary to define with precision the features that define the

elements to be classified.

- Hughes Phenomenon: In the same manner, given the high dimensionality of data, it is

necessary to generate extraction methods of unsupervised features in order to reduce the

dimensionality of data without reducing the information contained.

- Feature extraction: This reduction of features must be independent of the set of data used

to train the system. In this manner, changes made in the training elements shall not

change their descriptor variables.

- Spectral-spatial integration: The majority of hyperspectral techniques do not take

advantage of the information of spatially near points. However, it is necessary to

integrate spectral and spatial information in order to achieve better classification results

[GAMB_04, PLAZ_05].

- Computational cost: The great amount of information contained in a hyperspectral image,

as well as the computational cost associated to its processing makes necessary the use of

new techniques of extraction and classification of features that are computationally

efficient.

4. Conclusion

This chapter has shown how the chromaticity spectrum of an element has a direct relation with its

molecular properties and how the acquisition of hyperspectral images, which simultaneously

contain spectral information of its pixels, obtain spectral information for each of the positions of

the image. This approach analyses, from a spectrographic perspective, those objects within it.


Page 29

The use of these hyperspectral images combines spectrographic techniques with machine vision

techniques for their processing. This provides greater versatility when extracting information. An

illustration is provided of the traditional procedure for the classification of hyperspectral images

(Fig. 2.15) and a listing is made of the weak points in this process of classification.

There are several weak points, among which the following are highlighted: the election of

classifier, the methodology for the extraction of spectral features, the reduction in the Hughes

Phenomenon and the integration of spatial information of the nearby pixels with their spectral

information. This makes the design of an architecture of classification that overcomes these

limitations necessary.

In order to do so, Chapter III, will study the properties of the various existing classifiers in order

to select that or those most suitable for the generation of an adequate model of a material. On the

other hand, Chapter IV will provide an in-depth analysis of different methods for feature

extraction and the reduction of the Hughes Phenomenon, placing special emphasis on the

limitations that these methods have in order to define a set of desirable properties that should be

followed by an optimal descriptor, which in turn, allows the adequate description of a luminous

spectrum.


Page 30

Page 31

Chapter III

Classification methods

Chapter III Classification methods

Page 32

The previous chapter detailed the process of the formation of hyperspectral images observing

their suitableness for the characterisation of materials based on their luminous properties.

However, the great amount of information contained in the luminous spectrum makes the process

of its classification not an easy task.

First, a description is provided on the proposed metrics in scientific literature that distinguish

between different spectra [KESH_04]. These classical approaches evaluate the existing distances

between several spectra in a Euclidean space Rn or are based in the measuring of the spectral

angle between them, SAM (Spectral Angle Mapper) [WILL_04]. These methodologies offer an

adequate quantification that is related with the similarity between two given spectra, but do not

allow either the analysis of the existing correlation in adjacent spectral bands [CHAN_03] or to

correctly solve the problems caused by the existing similarity between spectra of the same group

or class. In the case dealt with in this Thesis, each of these classes represents the material

associated to each of the analysed elements. In this manner, one will acknowledge that a spectrum

belongs to the aluminium class when the point associated to that spectrum is composed of

aluminium. The fact that spectra associated to a same class (intra-class variations) bear a high

degree of variation make the modelling of this dispersion necessary in order to be able to

correctly determine the class associated to this spectrum.

However, these intra-class variations can be adequately modelled through the use of pattern

recognition techniques which take the information contained in the luminous spectrum as an input

vector. These methods of classification try to emulate the working of the human brain. In this

way, input data is modelled through diverse techniques, creating a mathematical model that will

relate these with the desired output. These methods not only emulate human beings in the way

they reach their decisions or analyse their surroundings, but also performs tasks of inference or

classification that can not be done by the human being.

The adequate use of these classifiers also allows to infer new knowledge from the results obtained

from the training of the classifier. This discovers underlying causes of the classifying process, a

priori unknown. Using this same approach Tycho Brahe, without having previous knowledge of

the underlying physical causes of the geometry of the orbit of the planets, took their precise

measurements, reaching the conclusion that they were elliptical. Using these measurements, his

disciple Johannes Kepler used this knowledge to obtain the laws that bear his name and that

describe the physical phenomena which cause the elliptical shape of the orbits [DREY_13].


Page 33

Analogously, the knowledge extracted by the classifiers have allowed to associate certain genes

or proteins with several diseases, to establish rules that predict the weather, to discover which

component of a medicine is the cause of the improvement in the evolution of a disease or, in sum,

extract rules that can be later used by the human being.

This chapter shows the metrics traditionally used for the quantification of the distance that exists

between two luminous spectra and a theoretical study is undertaken on the different existing

classifiers that are capable of modelling the intrinsic variation and the discriminant properties of

the materials to be classified.

The problem related with the high dimensionality of data inherent to hyperspectral images is not

dealt with in this chapter and shall be fully covered in Chapter IV.

1. Classical metrics

Traditionally, areas such as the multivariate analysis, signal processing or pattern matching have

used metrics based on distance in order to measure the existing differences between diverse input

signals.

As mentioned in an earlier chapter, a spectrum or a hyperspectral pixel, composed by the quantity

of reflected or emitted light in each of its wavelengths, is mathematically defined by a vector L of

m components, representing each of the components, each of the wavelengths in which this

spectrum has been quantized.

),...,,( 21 mLLL=L (3.1)

Where La and Lb are the vectorial representation of any two spectra, as shown in (3.1), a

definition can be provided for diverse distances derived from different norms l1, l2, lInf

- City block distance (CBD)

∑=

−=m

i

biaiba LLCBD1

),( LL (3.2)


Page 34

- Euclidean distance (ED)

( )2

12

1

),(ED

−= ∑

=

m

i

biaiba LLLL (3.3)

- Tchebychev distance (TD)

[ ]biaiMiba LL −= <<1max),(TD LL (3.4)

These measures represent the existing distance between the two spectra in accordance with the

three most commonly-known rules and establish in a simple and intuitive manner the degree of

existing difference between two specific vectors.

However, small differences due to changes in the intensity of the reflected spectrum cause

variations in the intensity of the spectrum which are not correctly absorbed by the estimates of

similarity undertaken by the defined metrics. These intensity changes can be due to different

factors, such as small variations in the intensity of the incident source of light, the geometry of

the object or its specular reflections. All these causes were already mentioned in the previous

chapter.

Let us suppose a spectral pixel composed of only two bands. In this manner, this spectrum is fully

represented by a two-dimensional space through two axes for each of the wavelengths. Without

loss of generality, one can geometrically interpret the calculated distances between different

spectra in this two-dimensional space (Fig. 3.1).

Where ba LL , are two spectra which one wants to calculate its similarity, and let aL be the

vector aL affected by a variation in the intensity of the reflected spectrum. One can observe that

the distance measurements are affected by this variation in the intensity of lighting.


Page 35

Fig. 3.1. Geometric representation of the Euclidean distance between La and Lb and the effects in

the value of this distance caused by intensity changes in the spectrum.

Other types of metrics, specifically designed as metrics for their use in hyperspectral images,

partially correct the sensitivity to lighting changes. These are based on orthogonal projections of

the two spectra to be compared. Among these, the one used most, is based on the calculation of

the spectral angle between two given spectra and is known as SAM (Spectral Angle Mapper)

[YUHA_92]. Based on the orthogonal projection between the two vectors, the existing hyper

angle between them is calculated, using this angle as the measure of similarity between spectra:

∑∑

∑

==

==><

=m

i

bibi

m

i

aiai

m

i

biai

ba

ba

LLLL

LL

11

1

·

··

,)cos(

LL

LLα (3.5)

)(cos),(SAM 1 α−=ba LL (3.6)

Taking into account the earlier two-dimensional representation in which the calculated hyper

angle corresponds with the geometric angle between two straight lines, one can observe that the

changes caused in the intensity of the spectrum do not cause changes in the angle between the

spectra.

Band 1

Band 2

),(ED ba LL

aL

bL

aL

),ˆ(ED ba LL


Page 36

Fig. 3.2. Geometrical representation of the Spectral Angle Mapper (SAM) and the effects caused

by changes in its intensity.

=

2

),(SAMsin2),( ED ba

ba

LLLL (3.7)

If both La as well as Lb have been previously normalised to the unit, then a relation exists

between the Euclidean distance and SAM. One can see that the Euclidean distance bears the same

response as SAM when using normalised vectors and using small angle values.

Fig. 3.3. Relation between Euclidean distance and SAM.

Band 1

Band 2

aL

bL

),(SAM ba LL

),(ED ba LL

Band 1

Band 2

aL

bL

aL

),(SAM ba LL

),ˆ(SAM ba LL


Page 37

Du Peijun et al [DU_05] analyse cases in which the classic SAM algorithm does not adequately

respond when estimating the differences between spectra. As it is based on the calculation of the

angle between two spectra, it is not affected by small changes in specific bands. These changes

oftentimes are provoked not by noise but rather by absorption bands due to chemical bonds

present in the spectrum, which truly constitute an essential feature for the discrimination between

different types of spectra.

In order to solve this problem, they offer several typologies of error and diverse methodologies

for the improvement of the algorithm, which though not validated by the obtained results,

highlight the limitations of SAM versus small variations in local ranges of the spectrum.

Other approaches, also based on the complete analysis of the spectrum, consider each spectrum as

a probability distribution. Taking this into account, the measures of divergence defined by

Kullback-Leibler [KULL_87] measure the difference between two given probability

distributions.

Although the Kullback-Leibler divergence is commonly referred to as a distance, it does not

comply with the commutative condition, therefore is not considered a true distance.

Where ba LL , are two spectra as previously defined. The definition of the Kullback-Leibler

divergence of the spectrum bL from the spectrum

aL is defined as the quantity of additional

information necessary to represent the spectrum aL taking the model as a basis

bL .

∑=

=m

iD

bi

aiaibaKL

1 L

L·logL)( LL (3.8)

In order to correct the non-symmetry of this measurement, Kullback-Leibler, in fact, define

divergence as:

)()(),( abKLbaKLbaKL DDD LLLLLL += (3.9)


Page 38

This measure has been employed by [CHAN_04] to define SID (Spectral Information

Divergence). In it, an assumption is made that this measure better extracts of the spectral

variability, as it is not based on geometric features as SAM (angle) or the Euclidean distance

(spatial distance), but rather it calculates the separation between two spectra based on the distance

between the distribution of the probability function between them.

Divergence defined in this manner (3.8) shows greater sensitivity to local variations due to small

absorption bands than that obtained by the earlier metrics, while at the same time keeps a more

than acceptable tolerance to noise. This is due to the use of the ratio between two components in

each spectrum, weighted by the logarithm, instead of the subtraction of the Euclidean distance or

the multiplication of the SAM component.

In order to illustrate the measurements taken by these metrics, figure 3.4 includes four spectra

associated to three different materials: spectrum A1, taken as pattern; spectrum A2, spectrum

belonging to the same material as A; spectrum B, belonging to a different material than A with

quite a different spectrum, and spectrum C, of similar appearance as A.

Fig. 3.4. Spectral representation of materials A1, A2, B and C.

When observing the differences obtained when comparing spectrum A1 with the rest of materials

using the earlier metrics, one can observe that these metrics easily detect the difference of

material B versus spectrum A1. However, there are difficulties when determining reliable

Wavelength

Intensity

A1

A2

C

B


Page 39

distances between spectra A1-A2 and A1-C, as in both cases one obtains similar similarity

measures, which do not detect whether those spectra correspond to similar materials or not.

This is due to the fact that the previously defined metrics , in a unsupervised manner, quantify

the existing differences between two spectra taking into account their global similarity. This has

the advantage that a previous estimate or extraction of relevant features or a supervised training is

unnecessary. This allows the use of unsupervised methods of classification (Means, Fuzzy K-

means), or methods of supervised classification based on examples (K-Nearest Neighbours).

However, the fact that they only take into account the global similarity of the spectra, without

taking into account local features, makes them not detect, in an adequate manner, small

discriminating features that result necessary for a correct classification of the material.

2. Classifiers

Though the previously described methods establish in a unsupervised manner the existing

differences between two given spectra from the whole of their components, they do not directly

extract or select those features that could be relevant to establish the difference between two

specific classes (as can be seen in table III.1).

In the same manner, they are also not capable of modelling the variations between spectra of the

same class, or to maximise the differences between elements of different classes, while, at the

same time, they minimise the differences between elements belonging to it.

These intra-class variations can be modelled through the use of pattern matching techniques that

boost those variations that represent the differences between the elements of classes while

reducing the influence of the intra-class variations.

TABLE III.1 COMPARISON OF THE OBTAINED DISTANCES FOR SPECTRUM A1 FROM DIFFERENT SPECTRA BASED ON CLASSICAL

METRICS

A2 B C

City Block 0.38 1.24 0.54 Euclidean Distance 0.06 0.19 0.08 Tchebychev Distance 0.03 0.05 0.03

SAM 1.00 0.98 1.00

SID 0.03 0.35 0.06


Page 40

There are numerous approaches when trying to design a classifier, though classifiers designed

using different techniques obtain identical solutions. This is the case of some classifiers based on

neural networks, whose results correspond with those of other classifiers based on statistical

approach.

The choice of an adequate classifier is a complex problem that usually is related with the type of

problem to be resolved, the need to extract subsequent information of the classifier or the degree

of knowledge by the designer.

According to [JAIN_00], there are three different approaches when designing a classifier:

− Similarity.

− Statistics.

− Calculations of boundaries of decision.

This chapter will develop the principles and concepts of the different types of classifiers. In order

to ease the comprehension and differences between the undertaken classifications by the different

types of classifiers, two classes are going to be used, defined by two variables X, Y, that follow a

normal two-dimensional distribution and that are overlapping, as shown in Fig. 3.5.

Fig. 3.5. Distribution of model classes to be classified using different types of classifiers.

For the sake of simplicity, twenty training elements will be used for each of the classes, in such a

manner that the influence of a small number of samples can be noticed. This way, the functioning


Page 41

of each type of classifier will be seen in a more intuitive manner. The data used for comparing

different classifiers is shown in figure 3.6.

Fig. 3.6. Distribution of training elements of the aforementioned classes.

The points used for the training of each of these classes are represented through the use of stars

and solid points, as shown in figure 3.6. Once the corresponding classifier is trained, then a

control cell is classified in order to define the region of the feature space that is classified for each

of the classes.

The chosen classes are defined by a Gaussian distribution that follows the distributions shown in

figure 3.7.

Fig. 3.7. Probability density functions of the two types of tests.


Page 42

As both follow known Gaussian statistical distributions classes, the optimal classifier which

minimises the Bayes’ risk (assuming costs 0 and 1 as shall be later seen), is equivalent to a

statistical classifier that calculates the maximum probability a posteriori (MAP), assigning to each

element the class whose membership probability is greatest. As they are Gaussian types, equation

3.10 defines the membership probability of vector x to an average class µ and the covariance

matrix Σ:

−∑−− −

∑=∑Ν

)()(2

1

2/12/

1

)·2(

1), (

µxµx1

µx

T

eDπ

(3.10)

Using this classifier, the optimal classification map for the two given classes is as follows:

Fig. 3.8. Optimal classification map

2.1. Similarity classifiers

This approach constitutes the simplest and most intuitive of all that exist. It is based on the fact

that two vectors of similar features probably will belong to the same class.

In order to do so, it is necessary to obtain a correct measure of dissimilarity or distance for the

feature vectors that are studied. For the case of hyperspectral pixels, the earlier chapter described

different metrics that quantify the degree of difference between two spectra.


Page 43

Based on these metrics, one can define several classifiers:

2.1.1. Nearest Neighbour 1-NN.

The nearest neighbour is one of the simplest classifiers that can be used. Through the use of this

method, the input vector is compared with all the range of training vectors, selecting the class

which has the nearest vector to the sample set as its membership class [DASA_91] [SHAK_05].

Despite obtaining reasonable results, this method requires the computation of distances with each

of the training points, therefore its computational cost is very high [XIE_93] [FUKU_84].

Another disadvantage of this method is that it requires many training samples in order to achieve

an adequate estimate of the probability density in each of the points [XIE_93] [FUKU_90] .

Despite these disadvantages, the nearest neighbour classifier is one of the most used as a

benchmark against other classifiers. The two main reasons for this are that it does not require

configuration parameters except for the selection of a distance. On the other hand, for an infinite

number of samples, this classifier has an error that is less than double the error that would be

obtained with the optimal classifier [COVE_67].

Figure 3.9 shows the classification results using this classifier. One can notice that the points near

training vectors are assigned to the class to which the nearest vector belongs.


Page 44

Fig. 3.9. Classification map based on 1-NN

Fukunawa [FUKU_90, FUKU_84] proposed algorithms of data reduction for non-parameter

classifiers based on the nearest neighbour. These algorithms reduce the number of elements used

by these classifiers, bearing in mind the difference between the probability density function that is

estimated with a complete set of data versus that estimated with a reduced number of vectors. In

this manner, the computational cost of the calculation of the classifier is optimised at the same

time that the space required for its storage is minimised without damaging the performance of the

classifier.

2.1.2. Nearest mean.

One of the simplest classifiers commonly used to carry on a quick classification of highly

separable classes is based on the nearest mean. This classifier assigns each point to the class

whose centre of mass is nearest to it.

For the case of normalised Gaussian distributions whose variables are not correlated, that is, those

which have the identity as matrix of covariance, the classifier of nearest mean would correspond

to the Bayes' optimal classifier.


Page 45

Fig. 3.10. Classifier of nearest mean separation

This classifier is not capable of modelling those classes that do not follow normalised Gaussian

distributions with variables that are not inter-correlated, as shown in figure 3.11. Because of this,

this classifier, despite its simplicity and reduced computational cost, does not allow a precise

classification in the majority of cases.

Fig. 3.11. Classification map based on nearest mean

2.1.3. Vector Quantization VQ.

Non-parametric methods, such as those based on the nearest neighbour or K-nearest neighbours

require a very high number of samples in order to achieve a valid estimate of the probability

distribution function. Due to this and in order to reduce the number of necessary samples,

methodologies have been proposed that reduce the size of the set of data used for the design and

training of the system in such a manner that the performance of the classifier is maintained.


Page 46

In order to reduce the size of the required training set [HART_68] [DEVI_80], one must verify

the effect of eliminating or adding each of the samples used for its design so that only "good"

samples are kept. However, for very large data sets, these methods are difficult to implement and

bear a high computational cost as they must re-evaluated each time that a vector is added or

eliminated from the set. Another disadvantage is that, depending on the data, a high reduction of

data is not achieved and it is highly dependent on the samples of the sample.

Counterpoised, algorithms based on vector quantization represent the feature space through a set

of vectors that obtain a simplified representation of it, defining each of the feature vectors as the

model vector nearest to the aforementioned feature vector from a set of previously selected model

vectors [XIE_93].

Fig. 3.12. The two existing clusters are represented by the vectors VQ1

and VQ2

These methods construct a vector quantization for each of the existing classes. In this manner,

each of the vectors to be quantized is assigned the nearest measured vector using either the

Euclidean distance, or more commonly, the one that forms the smallest angle with the vector to

be classified (3.11).

( ) 221 ˆ·ˆ),( vvvv 1

TS = (3.11)

Feature X

Feature Y

VQ1

VQ2


Page 47

),cos(),( 2121 vvvv =S (3.12)

Where 1v and 2v are normalised vectors, in such a manner that the vectors that have the same

angle shall obtain a similarity measure ),( 21 vvS =1, and those vectors that have a ninety-degree

angle shall obtain a similarity ),( 21 vvS =0.

Fig. 3.13. Where spaces are represented by two features, the decision on the model vector that

quantizes each of the input vectors is defined by the bisectors between the different vector

models.

This way, each origin vector is represented by an index or a binary vector that indicates which is

the vector model that represents it, this being the nearest to the origin vector.

Let us suppose the existence of M model vectors MVQVQVQ ,..., 21 . An input vector

v shall be represented by a quantized vectorVq of M components, its value being the unit in

that index whose associated vector model has greater similarity with the vector v and a value of

zero in the rest of the components.

( )

( )

≠

=

=

∀

∀

),(max),(,0

),(max),(,1

ii

i

ii

i

i

SSif

SSif

VQvVQv

VQvVQv

Vq (3.13)


Page 48

In this manner, for a set of two-dimensional data quantized by three vectors 321 ,, VQVQVQ ,

the quantization of the vector, being the model vector 2VQ the nearest to the vector v , v would

be, according to (3.13) 0,1,0=Vq .

Once the vectors are quantized (and reduced), other classifiers can be used to establish to which

class do the quantized vectors really belong to. This approach reduces the existing noise in

training, as well as a reduction in the necessary complexity for a second classifier.

One of the methods based on vector quantization is that proposed by Kohonen [KOHO_88] and

known as Learning Vector Quantization (LVQ) [KOHO_88]. Through this technique, which

combines supervised and unsupervised learning under an approach based on artificial neural

networks, space is divided into groups of similar features in an unsupervised manner (clustering).

The number of subclasses is equal to the number of neurons present in the first layer of the

classifier.

This primary clustering is done through the learning rule for competitive networks, in such a

manner that in the training phase, starting from a set of random model vectors (Fig. 3.14a), each

of the training vectors is compared with each of the model vectors. Once the nearest vector is

chosen (winner), this is modified so that the model vector is closer to the training vector used

(Fig. 3.14b),

Fig. 3.14. a) Initial model vectors, b) training phase, c) set of training vectors, d) final model

vectors.

This way, model vectors are modified during their training stage in a way that each of them

represents a cluster or set of data that is present in the training data (Fig. 3.14c, Fig. 3.14d).

a) b) c) d)


Page 49

Once the input vectors are quantized by the model vectors that were obtained in the unsupervised

training phase and which are represented in the binary vectorial representation in (3.13), each of

the model vectors is associated with its corresponding class in a supervised manner.

This approach efficiently resolves the case of classification of nonlinearly separable classes, in

which the classifier to be used requires a high number of samples and at the same time a classifier

of sufficient complexity in order to model the non-linearity of the classes (Fig. 3.15).

Característica

Característica

Fig. 3.15. Nonlinearly separable classes.

Through unsupervised clustering, the feature space is divided into a certain number of

differentiated subclasses between them. Once these subclasses are defined, the problem lies in

assigning each of the subgroups to each of the classes to which they truly belong.

This mixed supervised/unsupervised classification , first, separates nonlinearly separable classes

into compact subgroups that could be modelled in a simpler manner than the original class, and,

second, assigns each of the subgroups to its corresponding class through a supervised linear

classifier.

Feature Y

Feature X


Page 50

Fig. 3.16. Vectorial quantization obtaining four model vectors for each of the existing clusters.

Figure 3.16 shows model vectors that have been obtained to separate each of the classes, in this

way, those training points near VQ1 shall be defined by vector 1,0,0,0, VQ2 0,1,0,0, VQ3

0,0,1,0, VQ4 0,0,0,1.

In this way, the supervised phase of the classifier only has to assign vectors 1,0,0,0 and

0,0,1,0 to the black triangle class and the rest to the white trinagle class.

Figure 3.17 shows the result of the LVQ algorithm clustering the data in two, three and six

elements.

Fig. 3.17. Classification map based on LVQ (S=2, 3, 6)

Feature X

Feature Y

VQ1 VQ2

VQ3

VQ4


Page 51

2.2. Statistical classifiers

Statistical classifiers are a second type of classifier. These are based on a probability approach,

that is, in a reliable calculation of the probability that for some given conditions (the set of

features that define that element), it belongs to one class or another.

There are two main approaches to statistical methods. On the one hand, those that entail a specific

statistical distribution and that try to obtain those parameters that define a specific distribution, as

for example, the mean and variance, in the case of a univariate Gaussian distribution.

On the other hand, non-parametric approaches that try to estimate the probability density in any

region of the feature space without making any assumption on the type of distribution that they

create.

2.2.1. Probability Theory

A key concept in statistical classification methods, in particular and for any classifier in general,

is that of uncertainty. The probability theory becomes a fundamental framework for upholding all

the theory that underlies each classifier, given errors in its acquisition, the use of a reduced

number of features and the use of a finite set of sample data.

In a given classification problem where the aim is to assign an element that is defined by a feature

vector to one of the C possible classes, an adequate classifier could be that which classifies these

elements in accordance to which is the most probable class, given the previously observed

conditions.

In this manner, the probability that an element, given its feature vector x, belongs to a Ci class, is

defined by the conditional probability that it belongs to class i conditioned to the observation of

the vector x, that is, the class for which )( xiCP is maximum [KASH_86] [MORR_76].

Given that )( xiCP is unknown, the Bayes' theorem will be used (3.14) to indirectly calculate this

probability starting from the conditional probability that a vector x belong to class Ci.


Page 52

)(

)(·)()(

x

xx

P

CPCPCP

ii

i = (3.14)

Let us suppose two classes i and j, the classifier shall assign the element to that class that has

greater probability.

≤

>=

)()(

)()(

xx

xx

ji

ji

CPCPifj

CPCPifiClass (3.15)

Applying Bayes' theorem (3.14), )( xiCP is expressed in known terms, as:

≤

>=

)(

)(·)(

)(

)(·)(

)(

)(·)(

)(

)(·)(

x

x

x

x

x

x

x

x

P

CPCP

P

CPCPifj

P

CPCP

P

CPCPifi

Classjjii

jjii

(3.16)

As )(xP is constant for all classes, it can be eliminated:

≤

>=

)(·)()(·)(

)(·)()(·)(

jjii

jjii

CPCPCPCPifj

CPCPCPCPifiClass

xx

xx (3.17)

Working out the value, we obtain )(xrl (3.17), which represents the likelihood ratio and that can

be estimated directly as it depends on the a priori probabilities of the classes and on the

probability densities of the data for each class.

)(

)()(

j

i

rCP

CPl

x

xx = (3.18)

Representing the earlier decision rules based on the likelihood ratio, one obtains:


Page 53

≤

>=

)(

)()(

)(

)()(

i

j

r

i

j

r

CP

CPlifj

CP

CPlifi

Class

x

x

(3.19)

This classifier (3.19) is known as the classifier of Maximum a Posteriori (MAP). This classifier

minimises Bayes' error and is going to produce a smaller number of incorrect classifications.

If, additionally, the equiprobability of classes is added, that is, )()( ji CPCP = one obtains the

classifier of Maximum Likelihood (3.20).

≤

>=

1)(

1)(

x

x

r

r

lifj

lifiClass (3.20)

However, previous classifiers assume that the consequences of making a mistake when

erroneously classifying an element of one class in another have the same importance. However,

this condition is highly dependent on the use or application that is going to be given to the

classifier. For example, when separating radioactive materials, to include non-radioactive

materials in the mix is not as important as to label a radioactive material as non-radioactive.

In this manner, Kij is defined as the classification cost of an element that belongs to class i when it

really belongs to class j. Defining these costs, Bayes' rule is defined as that classifier that

minimises the risk of Bayes [BERG_85].

−

−≤

−

−>

=

)()(

)()()(

)()(

)()()(

iiiji

jjjij

r

iiiji

jjjij

r

CPKK

CPKKlifj

CPKK

CPKKlifi

Class

x

x

(3.21)

Assuming a symmetric cost function in which the cost of incorrectly classifying an element is one

( K12,K21=1) and the cost of correctly classifying another element is zero ( K11,K22=0), Bayes' law

(3.21) is simplified, thus becoming the classifier of maximum a posteriori (2.18).


Page 54

Furthermore, if the classes would be equiprobable, then one would obtain the classifier of

maximum likelihood (3.20).

Let us suppose C1 and C2 as two differentiated classes that are defined by a single variable X.

Without loss of generality, let us suppose the Gaussian distribution for each of the classes as is

shown in figure 3.18.

Fig. 3.18. Representation of the probability distribution of two classes C1 and C2

The classifier based on Bayes' law shall classify the classes according to equation 3.21.

−

−>=

otherwiseifC

CPKK

CPCPKKCPifC

Class

2

11121

22221211 )()·(

)(·)()·()(

xx

(3.22)

Let us assume that class C2 has double the probability of appearing than class

C1 2))()(( 12 =CPCP , that the cost of correctly classifying each of the classes is zero (K11=0,

K22=0), and that the risk of erroneously classifying an element of class C1 as a member of C2 is 4,

while the risk of classifying an element of C2 as C1 is only of 1 (K21=4, K12=1).

Applying all three previously described classifiers, maximum likelihood (3.20), maximum a

posteriori (3.18) and Bayes' law (3.20) to the previous case, one can see the different decision

maps that have been obtained for each of the classifiers (figure 3.21).

P

)( 1CP x

)( 2CP x

x


Page 55

Fig. 3.19. Classification based on a) maximum likelihood classifier, b) MAP classifier, c) Bayes'

law.

In practice, statistical classifiers estimate the probability distributions of each of the classes that

are present. Once this probability density is estimated, a selection of the target class is made in

accordance with any of the three aforementioned classifiers. Depending on the way that these

densities are estimated, these classifiers are classified as either parametric or non-parametric

classifiers.

2.2.2. Parametric methods.

Statistical classifiers based on parametric methods initially assume a certain distribution for each

of the classes and make an estimate of the probability density function by calculating the

parameters which define these distributions.

For the case of binary variables, we assume a binomial or multinomial distribution, while in case

of continuous variables we assume a Gaussian distribution. Given that this problem does not have

multinomial variables, we are going to set aside their development. The reader has more

information available in the associated bibliography [BISH_06][KACH_86][MORR_76].

Another widely used parametric method, as an extension of the Gaussian distribution, is the

assumption of a distribution of a class as a mixture of several Gaussians. This is an approach that

allows a more complex and exact modelling for these classes that are defined by several Gaussian

clusters.

2.2.2.1. Gaussian distribution

P

)( 1CP x

)( 2CP x

x

2

)( 2CP x2)·( 2CP x

P

)( 1CP x

)( 2CP x

x

P

)( 1CP x

)( 2CP x

x

restosiC

CPCPsiC

2

211 )()( xx >)(

)(·)()(

1

221

CP

CPCPCP

xx >

)()·(

)(·)()·(

11121

222212

CPKK

CPCPKK

−

− x


Page 56

The Gaussian model, also known as normal distribution, is a widely used model to define the

distribution of continuous variables. For the case of an only variable x, the Gaussian distribution

is defined as:

−−

=Ν2

2)(

2

1

2/12

2

)·2(

1),(

µx

e µx σ

πσσ (3.23)

Where µ andσ are the mean and standard deviation that define the distribution. For the

multivariate case, the Gaussian multivariate distribution is shaped as:

−∑−− −

∑=∑Ν

)()(2

1

2/12/

1

)·2(

1), (

µxµx 1

µx

T

eDπ

(3.24)

Where µ is the multidimensional mean vector and Σ the covariance matrix of the associated

class.

The selection of the Gaussian distribution to model a class defined by a set of variables is not a

fortuitous decision, the Gaussian distribution model many of the phenomena in nature as it is the

distribution that maximises the entropy. Additionally, the sum of the set of random variables is

another random variable, which takes Gaussian shape as the sum increases in terms.

Analysing the equation of the multivariate normal distribution (3.24), one can notice that only

one parameter depends on the position of the feature vector x, which is shown in (3.25).

)()( 12 µxΣµx −−=∆ −T (3.25)

This expression, known as the Mahalanobis distance is inversely proportional to the membership

probability of a point in space to the Gaussian class defined byµ and Σ.

One can verify that the representation of the geometric place of the equiprobable points is placed

in a two-dimensional space that corresponds with an ellipse with centre in the vector µ and with

axes on the direction of each of the eigenvectors of Σ, the length of these axes being proportional

to the magnitude of the square root of the associated eigenvalues [BISH_05].


Page 57

Fig. 3.20 Representation of an equiprobable surface for a two-dimensional Gaussian distribution.

Analogously, for spaces of higher dimension, the geometric place of the equiprobable points is

characterised by a hyperellipsoid whose axes are defined by the direction of the eigenvectors of

the covariance matrix and whose elongation corresponds with the eigenvalues associated with

each of the eigenvectors.

2.2.2.2. Estimate of the parameters for a Gaussian distribution from N

observations

Usually, the parameters µ and Σ of the Gaussian distribution are unknown and must be estimated

based on examples. Assuming a set of N starting observations NxxxX ,..., 21= belonging to a

Gaussian distribution, one can estimate the most probable Gaussian distribution from these

observed data.

Given this set of vectors X, one can define the logarithmic probability of having obtained that set

of data X conditioned to their belonging to a Gaussian distribution defined byµ and Σ as:

( ) )()(2

1ln

22ln

2

·,ln

1

1∑=

− −−−−−=N

n

n

T

n

NDNp µxΣµxΣΣµX π (3.26)

In order to obtain those parameters that maximise this likelihood, their function is differentiated,

making it equal to zero in order to obtain those parameters µ and Σ that are most probable.

U1

U2

µ µ

(λ1)0.5

(λ2)0.5

X1

X2


Page 58

Differentiating with respect to the mean µ and equalling this derivative to zero, one obtains the

value of the parameter µ that is most probable (3.28):

( ) )(,ln1

1∑=

− −=∂

∂ N

n

np µxΣΣµXµ

(3.27)

)(1∑=

=N

n

nML xµ (3.28)

Similarly, one can calculate the covariance matrix of maximum likelihood [MAGN_99] thus

obtaining the following results:

( )TMLn

N

n

MLnMLN

µxµxΣ −−= ∑=

)(1

1

(3.29)

Observing the mathematical expectation for each of the estimates of the parameters of maximum

likelihood, one observes that the mean of maximum likelihood corresponds with the real mean

(3.30). However, the estimate of maximum likelihood of the covariance does not correspond with

the real covariance (3.31).

[ ] )µµ =Ε ML (3.30)

[ ] ΣΣN

NML

1−=Ε (3.31)

In order to correct this fact, the variance estimator is re-calculated according to the following

formula (3.32).

( )TMLn

N

n

MLnN

µxµxΣ −−−

= ∑=

)(1

1~

1

(3.32)

In this manner, starting with a set of data, one can calculate the parameters that define the most

probable Gaussian distribution for that set of data through the application of formulas (3.30) and

(3.32).


Page 59

2.2.2.3. Application of Bayes' classifier to Gaussian distributions

Once the probability density function for each of the classes is estimated through the calculation

of the parameters that define the most probable Gaussian distribution from a set of data (3.30),

(3.32), then one can substitute the value of the probabilities defined by Bayes (3.21) with the

estimated Gaussian probability density function (3.24).

For the sake of simplicity and without loss of generality, let us assume that the maximum

likelihood classifier defined in (3.30).

Substituting the Gaussian probability function (3.24) in the equation that defines the likelihood

(3.17), there remains:

), (

), ()(

jj

ii

rl ∑Ν

∑Ν=

µx

µxx (3.33)

−∑−−

−∑−−

−

−

∑

∑=

)()(2

1

2/12/

)()(2

1

2/12/

1

)·2(

1

1

)·2(

1

)(jj

Tj

iiT

i

e

e

l

j

D

i

D

rµxµx

µxµx

1

1

x

π

π (3.34)

Simplifying this, there remains:

−∑−+−∑−− −−

=)()(

2

1)()(

2

1

)(jj

Tjii

Ti

elr

µxµxµxµx11

x (3.35)

Therefore, equation (3.20) can be re-written in the following manner:

>=

−∑−+−∑−− −−

otherwiseifj

eifiClassjj

Tjii

Ti

1)()(

2

1)()(

2

1µxµxµxµx

11

(3.36)


Page 60

Extracting logarithms from both terms of the equation, there remains:

>−∑−−−∑−

=−−

otherwiseifj

ifiClass ii

T

ijj

T

j 0)()()()( µxµxµxµx11

(3.37)

Bearing in mind the Mahalanobis distance defined in (3.24), one can notice that the estimate of

maximum likelihood for Gaussian classes is the equivalent of assigning to each element the class

which has the smallest Mahalanobis distance.

−∑−>−∑−

=−−

otherwiseifj

ifiClass ii

T

ijj

T

j )()()()( µxµxµxµx11

(3.38)

Mention must be made that in a case where the covariance matrices of both classes are identical

and with value of the identity matrix, the expression (3.38) is simplified obtaining the same

formulation as the Euclidean distance.

Where the covariances of all classes are identical, then the functions that define the decision map

of this classifier shall be linear. Otherwise, these shall be defined by quadratic functions obtained

at the intersection of two or several ellipses of non-parallel axes.

Figure 3.21 shows the classification by a maximum likelihood classifier through the estimate of

the Gaussian distribution parameters from a set of sample data. One can notice that the

classification is not exactly the same as that of the optimum classifier. This is due to the fact that

this classifier has estimated the necessary parameters based on a small number of training vectors.


Page 61

Fig. 3.21 Classification map based on an estimate of Gaussian distributions.

2.2.2.4. Gaussian mixture

Although the Gaussian distribution bears important analytic properties, there are important

limitations when modelling real data in which the studied classes can not be modelled as a sole

Gaussian, as each class can be made up of several separate clusters that can not be modelled as a

whole as a Gaussian. However, these clusters can be modelled through the combination of several

Gaussians [MCLA_88]. These examples include cases in which elements belonging to a same

class come from different Gaussian subgroups, that is, that they belong to the same class, but

being part of different differentiated subclasses.

The following figure (Fig. 3.22) shows two defined classes, the first by two elongated Gaussian

subclasses (A1 A2) (Fig. 3.22a), and the second class defined by another two, more compact

subclasses (B1 B2) (Fig. 3.22b) as shown below.


Page 62

Fig. 3.22 a) Distribution of points belonging to two classes whose probability function follows a

distribution each of them based on the sum of two Gaussian distributions. b) Probability function

assuming that each of the classes follows a Gaussian distribution.

If an attempt is made to make these classes similar to a Gaussian model, as shown in figure

3.22b, the model that is obtained is not capable of correctly representing the probability function

of each of the classes, thereby producing a lot of error in the classification. This classification

error is shown in figure 3.23, showing a high number of incorrect classifications.

Fig. 3.23. Classification map based on a sole Gaussian estimate for each class for classes with a

probability density function based on Gaussian mixture models.


Page 63

In the model based on mixed Gaussians (Gaussian Mixture Model or GMM ) [MCLA_00], each

of these density functions is defined as the sum of the probabilities of each of the K Gaussians

which they are part of, thus defining the probability of each class as:

∑=

∑Ν=K

k

kkkp1

), ()( µxx π (3.39)

Where kk ∑,µ are the factors that define the Gaussian functions of which the class is made of,

and kπ are each of the weights for each of the classes.

In this manner, each of the classes present is defined by the sum of several Gaussian distributions,

as shown in figure 3.24.

Fig. 3.24 a) Probability function of class 1 as defined by a mixture of two Gaussian functions. b)

Probability function of class 2 as defined by a mixture of two Gaussian functions.

In accordance with the previous distributions, an optimal classifier based on a Gaussian mixture

allows for an optimal modelling of the probability density function that defines each of the

classes, creating a minimal rate of error, as can be seen in figure 3.25.


Page 64

Figure 3.25. Bayes' optimal classification map for data based on models of Gaussian mixture.

2.2.2.4.1. Estimate of parameters of a distribution of Gaussian mixture from N

observations

Let NxxxX ,..., 21= be a set of observations to be modelled as a Gaussian mixture. If one

assumes that these points come from a distribution based on a Gaussian mixture, the function of

likelihood that, from a set of sample points X, defines the probability that those points belong to a

distribution mixture of Gaussians, for a set of parameters πk, µk, Σk of each of the Gaussian, is

provided by:

( ) ( )∑ ∑= =

Ν=N

n

K

k

kknkp1 1

,ln,,ln ΣµxΣµπX π (3.40)

Where N is the number of observed elements, K the number of Gaussian functions that define the

mixture distribution, πk the weights associated to each of the distributions, and µk and Σk the

mean vector and the covariance matrix that define each of the Gaussian functions.

Maximising this function, one obtains the parameters that more reliably describe the density

function of the input data. There are several approaches to calculate optimal parameters. Those


Page 65

methods based on gradient descent [NOCE_99] [FLET_87] have sufficient capacity in the search

for these parameters.

Despite the advantages of these methods, as well as other methods based on non-derivative

searches such as genetic algorithms, the method most commonly used for the search of these

parameters is the method known as Expectation-Maximization or EM [DEMP_77], [MCLA_97].

The basis of the EM algorithm is detailed in [BISH_06]. Next, a summary of the implementation

of the EM algorithm is included for the calculation of the parameters that define a distribution

based on a Gaussian mixture [BISH_06].

1. Initialisation: The means µk, covariances Σk and the weights πk that define the

distribution are initialised and the likelihood function is evaluated (3.40).

2. Expectation: The responsibility functions )( nkzγ are evaluated (3.41) using the current

values of the parameters.

( )( )∑

=

Ν

Ν=

K

j

jjjj

kknk

nkz

1

,

,)(

Σµx

Σµx

π

πγ

(3.41)

3. Maximisation: The parameters πk, µk and Σk are recalculated bearing in mind the

responsibilities calculated in an earlier step.

∑=

=N

n

nnk

k

new

k zN 1

)(1

xµ γ (3.42)

( )( )Tnew

kn

N

n

new

knnk

k

new

k zN

µxµxΣ −−= ∑=1

)(1

γ (3.43)

N

N knew

k =π (3.44)

Where Nk is defined by:

∑=

=N

n

nkk zN1

)(γ (3.45)


Page 66

4. Evaluate the likelihood function (3.40) and verify that the convergence criteria are met.

Otherwise repeat step 2.

If one estimates the probability density function using the Expectation-Maximization method, the

parametric model obtained using those training points shown in Fig. 3.26 are very similar to the

classification map of the Bayes' optimal classifier.

Fig. 3.26. Classification map based on the estimate of the GMM from the Expectation-

Maximization algorithm

2.2.3. Non-parametric methods

The greatest problem of parametric methods lies in that one must know a priori the shape of the

probability distribution of the sample that is going to be modelled. This limits these methods due

to the fact that some of the assumed statistical models shall not be capable of modelling all the

complexity of the shape of the probability density function of the distribution.

Different from the earlier ones, non-parametric methods do not assume a specific distribution or

form of the probability density function, rather they estimate the density function in each of the


Page 67

regions of the feature space based on their local behaviour, without assuming any specific

statistical behaviour.

In contrast to the advantage of not having to assume the shape of the probability distribution, a

greater number of training data is necessary in order to undertake its precise estimation as well as

a lot of storage space in order to contain the data of the estimated probability function.

2.2.3.1. Partition models on histograms

The simplest model for the estimate of the probability density function through non-parametric

methods is that based on histograms. For this, the space of the features is divided into a set of

partitions in such a manner so that each of the observations is going to belong to one of the

calculated groups. In this manner, the density function is defined for each of the partitions as the

number of observations belonging to that partition over the total number of observations made.

i

ii

N

np

∆= (3.45)

Where pi is the probability density for each of the partitions, ni the number of observations within

partition i, N the number of total observations and ∆i the size of the analysed partition.

This method can be extended to multivariate distributions, substituting the term ∆i with the

hypervolume of each of the partitions in the feature space.

The selection of size ∆i influences the estimate of the probability density function. High values of

this parameter shall make the curve be excessively smooth and will not correctly represent the

probability function (Fig. 3.27a). However, low values shall offer a more precise estimate of the

probability density function. On the other hand, they require a greater number of samples for its

training; otherwise they obtain a bad estimate as shown in figure 3.27c. As a general rule, a

compromise is reached between the precision of the curve and the number of samples necessary

for a correct estimate, as can be seen in figure 3.27b.


Page 68

Fig. 3.27. Estimate of the probability density function: a) two partitions, b) four partitions, c)

twenty-four partitions

In practice, the method based on the partition of the histogram is useful when estimating density

functions and visualising data in one or two dimensions, but becomes unfeasible for the majority

of applications in which there is a requisite to estimate probability density functions.

On the one hand, this is due to the fact that dividing the feature space into a number of set

partitions makes this probability density function be discontinuous along the boundaries of the

frontier [BISH_06].

On the other hand, to obtain a density of data sufficient to make a precise estimate of the

probability density function in high dimensionality spaces for each of the partitions becomes

absolutely unfeasible. In order to create a histogram of M partitions in each of the variables, given

a variable x of D components, the total number of partitions would be MD, needing to obtain an

adequate density of observations in each of the partitions.

Despite its limitations, the models based on histogram partition can be used as a basis for the rest

of non-parametric methods which estimate the density function by correcting its limitations.

The Parzen window and, overall, the Kernel density estimators [PARZ_62], [DUDA_73],

eliminate some of the problems of basic methods based on histograms by assuming a binomial

distribution of the number of points belonging to each of the partitions and by substituting the

hypercube partition with Gaussian kernel that eliminate the generation of discontinuities in the

probability function.

2.2.3.2. K-nearest neighbour

P

x

P

x x

P


Page 69

One of the problems of the earlier methods lies in that the calculation of the size of the partition,

or the chosen kernel, is the same regardless of the region of the feature space where we find

ourselves. In this manner, a large size would be useful when estimating the probability density

function in scarcely dense regions. However, it will not provide necessary information on the

shape of the probability density function. On the other hand, a small value of this parameter will

result in the obtaining of a probability density function that has great detail in the regions of high

probability, but noisy in regions of low density due to the lack of training samples in that region.

If instead of determining a priori the value of the size of the partition one creates a sphere in point

x where one wishes to calculate the probability density and one makes its radius grow until it

contains K elements, the estimate of probability p(x) for this point shall be:

NV

Kp =)(x (3.46)

Where N is the number of total observations and V the volume of the hypersphere used.

This method is known as K-nearest neighbours [SHAK_05] and solves some of the limitations of

the methods based on fixed window sizes that were been previously seen, where this size is

adjusted depending on the degree of density in each of the regions of the feature space. A point to

bear in mind on this method is that the value of K determines the smoothness of the density

function. The intermediate values of K are its optimal values.

Applying this estimation of probability density to each of the Ci classes that one wants to classify,

then one has the estimate of the probability density for each of the classes:

VN

KCp

i

ii =)(x (3.47)

And the probability for each of them:

N

Np i=)(x (3.48)


Page 70

Applying Bayes' theorem (3.14) and substituting each of the terms with (3.46, 3.47 and 3.48), one

obtains the maximum probability function a posteriori for the classifier based on K-nearest

neighbours:

K

K

P

CPCPCP iii

i ==)(

)(·)()(

x

xx (3.49)

In this manner, the Bayes' optimal classifier is given by that class that has the most elements

within the K-nearest elements.

Figure 3.28. Effect of number K in the classification map obtained through the classifier based on

K-nearest neighbours, for K=1, K=2 and K=4.

2.3. Classifiers based on the calculation of boundaries of decision

A third category of classifiers is based on the calculation of boundaries of decision in an

empirical manner, in such a way that these boundaries minimise a specific condition imposed in

the design phase.

Some of the criteria to be minimised that are most used by these methods are the apparent error of

classification (classification rate of error), or the mean quadratic error between the output value of

the classifier and the numeric value (usually vectorial) associated with the correct class.


Page 71

A classical example of these methods is Fisher's discriminant analysis that, as mentioned in the

previous chapter, maximises the function which measures the separability between classes,

maximising it.

Other examples of these types of classifiers are neural networks such as the perceptron

[RAUD_98] or the multilayer perceptron. The perceptron calculates a hyperplane that separates

the classes thus modifying the value of their internal weights until the calculated plane minimises

the classification error.

The multilayer perceptron acquires the capacity to model nonlinear boundaries of decision.

However, this property can cause an over-training in the classifier as its complexity (number of

neurons and layers) increases, which is corrected by means of other methods of regularisation.

[BISH_95, CHENG_94, RIPL_96]

Other methods such as the Support Vector Machines or SVM [VAPN_98, BURG_98, SCHO_97]

project the feature vector through the adequate kernel to another space of higher dimension in

which data is more easily separated. In that higher order space, a linear classification is made,

maximising the margin of separability between classes [VAPN_06].

In this section, given its importance in the field of classification, a brief description is provided on

the functioning and the theoretical basis of the perceptron and the multilayer perceptron.

2.3.1. Perceptron

The perceptron is a linear classifier invented by Frank Rosenblatt in 1962 [ROSE_58]. His work

was highly criticised by Marvin Minsky [MINSK_69], who stated that neural networks were only

going to be able to resolve problems that were linearly separable. However, although Minsky was

only able to demonstrate this theory for single layer networks, his work caused a great decline in

funds invested for research in this field and it did not recover until the mid-80's when the

backpropagation algorithm was created.

The perceptron was created to the likeness of the functioning of the human visual system. Each of

the perceived elements is projected on an area known as projection area, which in turn is linked


Page 72

with neural connections to another area, known as the association area, whose connections

determine the response given by the network, as shown in figure 3.29.

Figure 3.29. Biological basis of the classical perceptron.

The mathematical formula for the functioning of the perceptron is based on a feature vector x that

can be transformed in a nonlinear manner in order to obtain a transformed feature vector Ф(x)

(projection area). This vector is multiplied by a set of associated weights w (association area),

thus the response of the network being a function of a linear combination of the input vector,

weighted by the vector w. For simplicity's sake and without loss of generality, let us consider

Ф(x)=x, extracting the area of transformation from the neural network model. However, one can

consider the transformation of Ф(x) outside the network model.

Fig. 3.20 Mathematical representation of the perceptron.

X1

X2

…

wT·x+b

W1

W2

WN

y(x)

b

f(y(x))

Projection Area Association Area Responses

Retina


Page 73

Figure 3.30 shows that each of the input vectors is weighted by a weight wi, and by an offset b.

by += xwx T ·)( (3.50)

<−

≥+=

0)(,1

0)(,1))((

x

xx

y

yyf (3.51)

Where the result of the output of function y(x) is positive, the feature vector shall be assigned to

class A, and where negative, to class B. If y(x) is equalled to zero, then one obtains that the

element that defines the boundary of decision between both classes corresponds with the

hyperplane 0· =+ bxw T . In this manner, one can see that the perceptron separates the present

classes through a hyperplane, whereby this classifier shall be useful in the separation of classes

that are linearly separable.

The way to establish the weight vector w that defines the discriminating hyperplane is based on

the minimisation of an error function that assumes zero error when an observed feature vector is

correctly classified and an error equal to the unit when an element is incorrectly classified.

Therefore one obtains an error function, including as weight within the vector w=[w;b] the b

offset and assuming an input x=[x;1], in such a way that it unifies the offset as an input that

always has the value of the unit and an associated weight defined by b.

))((·(·)(1

n

N

n

n yfE xxwwT∑

=

−= δ (3.52)

Where N is the number of elements present in the sample set and δ the parameter that defines the

class to which vector n belongs. In such a way that if an element is correctly classified, then there

shall be no error, and if it is not, this error shall be equal to nxwT ·

In order to find those weights w that minimise this error function, one applies an iterative

algorithm based on a descending gradient which modifies the association weights, thus

minimising the error produced in the classification. In this manner, the weights are interactively

modified throughout the training process according to the opposite direction of the error gradient


Page 74

in accordance with the learning function (3.53), converging where the problem is linearly

separable in a finite number of iterations [ROSE_62].

nn

ttt yfE xxwwww ))((·()(1 −+=∇−=+ δηηr

(3.53)

The fact that the problem is linearly separable does not imply the existence of a sole solution. The

result of the solution shall depend on the initialisation parameters of the weights [HERT_91].

Other training algorithms of the perceptron [FREU_98] have been developed, both based on the

uses of a kernel for the projection of data in other dimensional spaces with greater separability, as

well as based on changes in the learning algorithm that allow the convergence of the perceptron

in the hyperplane that maximises the margin of separation between classes.

If one observes the classification results of the classic perceptron against test classes, one can

observe that there is not a hyperplane (line) capable of discriminating between both classes, given

that they are not linearly separable. In this case, the training of the perceptron calculates that

hyperplane that minimises the error function.

Figure 3.31. Representation of the classification map of the perceptron a) 0 iterations, b) one

iteration, c) ten iterations.

2.3.2. Multilayer perceptron

As previously seen, one of the main limitations of the perceptron is that it is only capable of

modelling decision maps that are linearly separable. In order to correct this limitation, the

multilayer perceptron has two main peculiarities:


Page 75

• this network has a hidden intermediate layer which makes it possible to model as

complex regions of decision as desired, adding new neurons in this intermediate layer, or

even, adding new hidden layers,

• On the other hand, the step function (3.51) that is implemented in the classic perceptron

is substituted by continuous and differentiability functions implemented in each of the

neurons. In this manner, the adjustment of the weights of the neural network during the

training phase is made possible via methods based on the gradient descent.

Fig. 3.32. Architecture of a multilayer perceptron or a backpropagation network.

Analogously to the classic perceptron, the output of each of the neurons of the intermediate layer

is defined as:

= ∑

=

M

j

jii wfa1

1 )(),( xWx (3.54)

Where fi is each of the activation functions in the intermediate layer and wj each of the j weights

that connect with the neuron i belonging to the first layer.

Similarly, the output of the intermediate layer that is projected towards the output layer is defined

as (3.55):

X1

X2

f1(x)

XN

f2(x)

fm(x)

f1(x)

f1(x)

Y1

Y2

W1 W2


Page 76

= ∑

=

M

j

jii wfy1

2 )(),( aWa (3.55)

Where f i is each activation function in the last layer and wj each of the j weights that connect with

neuron i belonging to the transition between the intermediate and final layer.

Minsky and Papera [MINSK_69] verified that a two layer network could overcome many of the

restrictions of the multilayer perceptron, however, they did not provide any solution to the

problem of adjusting weights through hidden layers of the network in order to minimise its final

error.

The solution to this problem did not come until the mid-80's when Rumelhart [RUME_86]

offered a solution to this problem. The main idea of this method consists in propagating the error

obtained in the final layer of the network through hidden intermediate layers towards the input

layer. In this manner, the weights of all layers are adjusted during the learning phase. Different

solutions to this learning method have been implemented in order to accelerate and strengthen the

convergence of the network, from classical solutions based on the gradient descent to other more

advanced that use the information of the second derivative and of the Hessian to accelerate

convergence [SNYM_05]. Other novel methods to train a multilayer perceptron are detailed in

[BISH_08].

The theory of universal approximation for neural networks shows that every continuous function

that relates a set of real numbers in a given interval with a continuous output value in a number

interval of real numbers can be approximated to a multilayer perceptron composed by a single

hidden layer. It has a precision which grows with the number of neurons that exist in that

intermediate layer [BAKE_98] for certain types of activation functions, thus making the

multilayer perceptron into a universal approximator.

Given the increase in complexity of the classifier and in the number of parameters that define it,

there could be a problem of over-training of the classifier due to not having the necessary number

of learning elements. Because of this, regularisation methods have been created that avoid the

over-training of the system. Some of these methods are implicit in the network itself, such as

reducing the complexity of the network, a slow training, as well as reducing the number of

iterations. However, other methods include the adding of noise and the decrease in weights, as in


Page 77

this manner one avoids, on the one hand, falling into local minimums, while on the other,

reducing the module of the weights, which is associated to over-training phenomena [BISH_95],

[CHENG_94], [RIPL_96].

One of the main advantages of the multilayer perceptors lies in that if they are correctly designed,

they provide an estimate of the reliability of the undertaken classification, being able to discard

those cases that are doubtful or on the limit [JAIN_00].

If one observes the behaviour of this classifier, one can notice the decision maps calculated for

different numbers of neurons in the intermediate layers. One can see that the estimates of the

decision maps are more similar to those obtained by an optimal classifier (Fig. 3.8) when the

number of neurons in the intermediate layer is greater. However, one can begin to notice the

effect of over-training when having a high number of neurons.

Figure 3.33. Representation of the classification map of the multilayer perceptron a) two

intermediate neurons, b) ten intermediate neurons, c) thirty intermediate neurons.

In the same manner, taking advantage of the capacity of the multilayer perceptron as a universal

approximator, one can classify the set of data defined in figure 3.22, obtaining the following

results:

Figure 3.34. Representation of the classification map of the multilayer perceptron a) four

intermediate neurons, b) ten intermediate neurons, c) thirty intermediate neurons.


Page 78

In this case, if one compares it with the classification map of the optimal classifier (Fig. 3.25),

one notices that a reduced number of intermediate neurons does not model the real probability

density function well. However, for a high number of neurons, given the small number of

elements used for training, an over-training occurs due to the fact that one does not have the

necessary examples for the correct adjustment of the classifier. The calculated decision map does

not correspond with the optimal classifier in areas where there is no presence of samples.

However, it achieves a very good classification of training samples.

2.4. Combination of classifiers

At times, it is necessary or advisable to use data from diverse classifiers. Some of the reasons that

might lead to the combination of the information from several of these classifiers in order to

achieve greater precision from the classifier are listed in [JAIN_00].

− The same problem can be tackled based on different classifiers, each one representing the

same problem through a totally different representation of it. This could be the case in the

identification of persons using a combination of classifiers that combine different sources

of biometric information.

− Where the available data samples are obtained in different conditions or they can not be

taken simultaneously or even where there are different variables that do not train a joint

classifier by combining both data types at the same time.

− Classifiers using the same input data and obtaining a similar classification performance

do not necessarily have to model data in the same manner, modelling each of the classes

in a totally different manner.

In summary, one can have available different groups of features, training groups, different

classification methods and even different training methods, whose outputs can be combined in a

classifier which optimises the global performance of the classification.

There are two main ways of combining several classifiers:

− Parallel.

− Series or cascade.


Page 79

Those techniques based on parallel processing assume that all the classifiers act at the same level,

being combined at a later time by a combining classifier that can even modify the weights of each

classifier.

Cascade based techniques execute the classifiers in a sequential manner, in such a way that as

they advance, the classifiers differentiate between a smaller number of classes, therefore being

much more specific.

A novel technique insofar as the combination of classifiers is that based on boosting FREU_96],

[SCHA_90]. This technique is based on an adequate combination of weak classifiers, that is,

classifiers which do not obtain a result that is very different to that which would be obtained

through a random choice of classification. When they are combined, this obtains a very high

performance in the classification. The advantages of these methods lie in that, as they are based in

a very high number of weak classifiers, the effect produced by the noise becomes less, as it only

affects a certain number of weak classifiers and does not affect the final precision of the system.

As far as combinators, there are basically two types:

• Those considered static, in which little or no training is necessary [TRES_95], such as

those based on mode, mean, median, sum, weighted sum...

• Other type of combinators that require training in order to tune the relation between the

different classifiers and which allow a better performance [JACO_91]. The design of

these adaptive combinators follows the same principles as a normal classifier, as stated in

this chapter.

3. Conclusions

This chapter has shown different metrics that measure the difference that exists between two

given spectra in a quantitative manner. The issue of the modelling of the different classes from a

different perspective is necessary due to the inability of these measures to be able to adapt to the

changes that these spectra can undergo due to the lighting conditions, geometry of the object


Page 80

which they represent, small chemical changes in the materials or the variability of the class which

they represent.

Additionally, in this chapter a description has been provided on the mathematical basis and the

principles of different mathematical classifiers as an alternative to traditional metrics in order to

efficiently model the variability and discriminating features of the spectra. The use of these

classifiers is going to allow the creation of a suitable model that represents the inherent properties

of the materials to be classified, thus increasing their separability with elements that belong to

other classes and decreasing the separability between elements of the same class.

The Gaussian classifier significantly stands apart from between all those classifiers that are

presented in this chapter. Its simplicity and the possibility to know its precise mathematical

parametrising estimates the overlapping between different classes, obtains a good generalisation

and has available the necessary metrics to calculate the degree of similarity between the different

Gaussian models.

These properties are key when selecting the Gaussian classifier as the most adequate classifier, as

it validates the set of optimal descriptors and the methodology that maximises the classification.

Page 81

Chapter IV

Feature extraction in hyperspectral vectors

Chapter IV Feature extraction in hyperspectral vectors

Page 82

One of the main problems that must be faced by the the classification methods mentioned in the

previous chapter is related to the high dimensionality inherent to hyperspectral data when used for

classification tasks [FEATH_05, KUAN_05, PERK_05]. In order to solve this problem, different

techniques are used that reduce the number of features employed as inputs to these classifiers,

reducing to a great extent the Hughes Phenomenon [MANO_04]. In order to do so, the

information contained in spectral bands must be decorrelated using traditional methods such as

the Principal Component Analysis (PCA) or others [FEAT_05, TATZ_05, RAJP_03,

WANG_06], or through the selection of the bands that better discriminate the elements to be

classified [WILL_04, MERC_02, RELL_02, RAMA_05].

Methods based on the decorrelation of the spectrum and feature extraction try to achieve the

greatest reduction in the feature space through mathematical transformations. These

transformations, at times orthogonal, cause these transformed vectors to not be physically

plausible, or that, even when having an associated physical meaning, to be hardly capable of

being interpreted.

In order to avoid this problem, other methodologies based on the selection of features try to select

the most relevant of the spectrum without affecting their interpretability. Within this group, there

are some algorithms based on expert systems, which find and characterise different absorption

bands and which differentiate diverse materials based on previously established tables.

Nonetheless, this has the disadvantage that in order to include new materials, these must be

manually tabulated. Therefore, other methodologies select the subset of features that maximise

the classification or the separability of the sample.

This chapter shows the different current techniques that extract the adequate features for a

correct spectral characterisation. Although the complete feature vector of the feature spectrum

can be used to model a classifier without the need of a previous process or feature extraction, it is

not the most efficient manner to do so.

In fact, the use of a high number of components and complex classifiers for the resolution of a

classification problem generates inferior results to those obtained via the use of an adequate

number of variables [PAI_07].


Page 83

This is due, on the one hand, to the fact that, generally, it creates an exponential growth of the

complexity of the distribution of the data in the space of high dimensionality. This makes the

function, which is the objective to be modelled, bear a greater complexity.

On the other hand, due to the increase in data dimensionality, there is an exponential increase of

the number of necessary samples in order to maintain the density of samples in the dimension of

the chosen space, a necessary parameter in order to achieve a correct estimate of the classifier's

parameters.

Hughes [HUGH_68] explains this phenomenon when comparing the existing relation between the

expected success of a classifier with its own complexity and with that of the number of samples

used for its training.

Fig. 4.1. Evolution of the performance of the classifier based on the number of features

In this way, a greater number of samples are necessary in order to obtain a correct classification

as the complexity of the classifier and the dimensionality of feature space increases. This

exponential growth in the number of necessary features to maintain the density of samples within

the space is known as the "Curse of Dimensionality” [BELL_61], and is the main cause for the

Hughes Phenomenon.

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

2 3 5 7 10 20 30 40

Number of Components

Classification ratio


Page 84

In this manner, for a certain number of samples, the performance of a classifier increases at first

with the increase in features. Later, it decreases abruptly if its number is high due to not being

able to correctly estimate the probability distribution for the given dimension of feature space and

the number of samples.

On the other hand, spaces of high dimension have peculiarities that are difficult to sense at first,

in this manner, the volume of a hypercube is concentrated in its extremes as its dimension

increases. This can be demonstrated by studying the ratio between the volume of a hypercube of

side L versus side L – ε. Figure 4.2 shows that this fraction (4.2) tends towards one when the

dimension of the feature space tends towards the infinite.

d

Hipercube LV = (4.1)

d

dd

L

LLRatio

)( ε−−= (4.2)

Fig. 4.2. Volume ratio between two hypercubes of length L and L- ε in relation to their

dimensions.


Page 85

This indicates that, for high dimensions, the greater part of the volume is concentrated on the

surface of the hypercube. Due to these properties, a space of high dimension tends to be empty

inside and is therefore susceptible of being represented by a space of smaller dimension. The

effect of the curse of dimensionality is limited through a correct reduction of features, thus

reducing the number of representative samples to train the classifier. Furthermore, its complexity

is simplified thus deriving in a greater efficiency in the generalisation and the computational cost

of the process is optimised by using a smaller number of features.

Therefore, a reduction in the number of highly correlated features that are present in

hyperspectral data is necessary. Along this line, there are two main methodologies to reduce the

set of features of a set of multidimensional data and to find a subspace of features that allows for

their better separability:

− Feature extraction: Extracting new features from the combination of existing features.

− Feature selection: Selecting those features that appear as most adequate for the tasks of

classification without modifying them.

1. Feature extraction

The process of feature extraction generally consists in a process which projects the data contained

in the original feature space (featured in the spectral vector L) over another feature subspace of

equal or smaller dimension.

Independent of the transformation method, the transformed subspace must be capable of

maintaining the separability of the classes, reduce the noise, or in certain cases and depending of

the needs, compress data keeping in the reduced space the majority of the information contained

in the original space.

Most of the techniques for feature extraction applied to hyperspectral data are based on the

premise that the observed spectrum is a consequence of the sum of several underlying physical

processes caused by the set of particles which shape the object. With this premise, the observed

spectrum can be represented as the weighted sum of the pattern spectra of each of the spectra

associated to these underlying physical processes.


Page 86

Generically, one can express this linear transformation through a matrix A of [MxN] dimensions

when knowing the vectors that are going to make up the axes of the reduced subspace. Each

column n contains each of the N vectors that are the basis of the transformation system. These

vectors are going to be the basis of the representation of the system.

LL t

dtransforme

rrA= (4.3)

A compression in the transformation occurs if the number of base vectors in matrix A is less than

the original number of components. This compression must optimise some of the previously

mentioned criteria: separability, reduction of noise or compression.

Generally speaking, these base spectra, which are going to be used to represent the original

spectrum, are unknown both in number as well as in value. As mentioned earlier, the objective of

this transformation is to achieve a representation of the information contained in the higher

dimension space, in a subspace of smaller dimensionality, in a manner that it maintains or

increases the separability of classes, or that it maintains the maximum possible information of the

signal.

The most common approach to the problem of the feature extraction based on linear

transformation is PCA or Principal Component Analysis [HOTT_33]. This technique, also known

as the Karhunen-Loève transform, has been widely used in applications for dimensionality

reduction, compression with losses, feature extraction and visualisation, among others.

This method consists in an orthogonal transformation that calculates a new sequence of

uncorrelated vectors known as principal components. This transformation diagonalises the

covariance matrix in such a way that each of the obtained features does not bear a correlation

with the rest of the features in the transformed space.

This ,to a certain extent, estimates those components that are not influenced by any other and

that represent an adequate estimate of those principal vectors from which the observed spectra are

formed.


Page 87

The problem of principal components can be considered via different equivalent approaches, all

reaching the same mathematical formulation [GOME_02]:

− Search for an orthogonal transformation which provides a set of variables maximising the

variance of the transformed sample (Formulation of maximum variance).

− Search for an orthogonal transformation that provides a set of uncorrelated variables.

− Search for a straight line for which the square sum of the perpendicular distances to the

data is minimal (Formulation of minimum error).

Let us consider L as a vector of D components represented in the non-transformed space, and

Ln a set of L spectra taken as a sample.

The aim is to project the data contained in the original space of D dimension, into a subspace of

M dimension in such a way that the variance in the transformed space is maximised.

∑=

=N

n

nN 1

1LL (4.4)

Where L is the mean of the set of spectra Ln, and 1u the unit vector that defines the direction

of projection, one can define the variance of the data sample over 1u as:

1T1

1

T1

T1 ˆ··ˆ·ˆ·ˆ

1uSuLuLu =−∑

=

N

n

nN

(4.5)

Where S is the covariance matrix of the set of sample data defined by:

∑=

−−=N

n

T

NS

1

))((1

LLLL nn (4.6)

If one maximises the projected variance (4.5) in regards to vector 1u , with the restriction that the

module of that vector is one (4.7) (in order to avoid equivalent solutions with different modules),


Page 88

1ˆ·ˆ 11 =uuT

(4.7)

Maximising this term based on Lagrange multipliers, one has the following target function:

f = )ˆ·ˆ1(ˆ··ˆ 1T111

T1 uuuSu −+ λ (4.8)

Differentiating with regard to 1u and equalling to zero the target function, one finds that the

maximum is obtained for:

111 ˆˆ· uuS λ= (4.9)

111 ˆ··ˆ λ=uSuT (4.10)

In this way, the variance shall be maximum when 1u is equal to its eigenvector associated to its

highest eigenvalue of the covariance matrix. Defining successive directions that maximise the

variance, taking into account orthogonal directions to those previously defined, one obtains a set

of eigenvectors Muuuu ˆ,...ˆ,ˆ,ˆ 321 associated to the first M eigenvalues of the covariance matrix

in descending order Mλλλλ ,...,, 321 . The transformation to this new feature space is given by:

[ ] )·(ˆ,...ˆ,ˆ,ˆ T321 LLuuuuLVL −== M

t

transfomed (4.11)

=

Mλ

λ

λ

...00

............

0...0

0...0

S 2

1

transfomed

(4.12)

It is noteworthy that the covariance matrix in the transformed space is defined by the diagonal

matrix created by the different eigenvalues, which, besides maximising the variance between

data, has achieved the extraction of a set of independent features.


Page 89

On the other hand, the λ variance associated with each eigenvector u , defines the quantity of

information contained by this vector. As a general rule, the eigenvectors associated to higher

values of λ are associated with discriminating features and the eigenvectors of lesser variance are

associated with noise.

A criteria when selecting the number of eigenvectors that are going to be part of the reduced

space is that of maintaining the M<D eigenvectors that are associated to the N higher

eigenvalues. The percentage of information contained in those eigenvectors is defined

throughτ as:

τλλ =∑∑==

D

i

i

M

i

i

11

(4.13)

Therefore, the first components of the transformed space represent the greater part of the

variability of the system, thus being able to reconstruct, in a precise manner, each original vector

dispensing with the components associated to the eigenvectors which contain less information.

However, the analysis of the principal components has several limitations [CHERI_03]. The first

lies in that this method is oriented towards feature reduction in order to maximise the variance of

the transformed data thereby aligning the transformed axes for its maximisation. At times, these

axes do not have the discriminating information between classes. That is, PCA maximices the

variance rather than the separability.


Page 90

Fig. 4.3 Axis defined by the principal eigenvector calculated via the PCA method.

For certain distributions, such as that seen in figure 4.3, the transformation made by the PCA does

not maximise the maximum separability between the two classes present.

In contrast with the Principal Component Analysis, the Linear Discriminant Analysis, also known

as Canonical Analysis or Fisher's discriminant [BISH_07], creates a set of transformation vectors

which maximise this separability.

For this, the measure of separation between classes is defined as the distance between means,

corrected with the value of the variances of the classes in the transformed space. In this way, and

intuitively, one can observe that the separability of classes shall be greater as there is more

separation between the means of the classes, weighted by the inverse value of the variances of the

classes in that transformed space (4.14).

( )22

21

212)(ss

mmJ

+

−=w (4.14)

Where 2ks is the variance calculated for each of the classes in the transformed space:

Class 1

Class 2

X1

X2 Principal Eigenvector


Page 91

∑∈

−=kCn

knk mys 22 )( (4.15)

And w the transformation vector that converts original vectors into transformed vectors yn .

n

T

n xy ·w= (4.16)

Therefore, the aim is to find the w transformation vector that maximises )(wJ , and with this,

also maximise the separability between classes.

Re-writing )(wJ based on w, there remains:

wSw

wSww

··

··)(

w

BT

T

J = (4.17)

Where BS is the interclass covariance matrix as defined by:

T))·(( 1212B mmmmS −−= (4.18)

And wS is the total intra-class covariance matrix, as defined by the sum of the covariance

matrices of each of the classes:

∑ ∑= ∈

−−=

2

1w ))·((

k Cn

T

knkn

k

mymyS (4.19)

Differentiating )(wJ regarding w, one obtains that )(wJ is maximised when:

wSwSwwSwSw ·)···(·)···( BwwBTT = (4.20)


Page 92

Fig. 4.4. Axis defined by the vector w obtained via the LDA method.

This method can be generalised for K>2 classes. In this generalisation, two new concepts are

included. Instead of the variance of each class, a mixture of dispersion matrices within the classes

is used, and instead of using the separation of means, a dispersion matrix between classes, that

takes into account the dispersion of all the classes between them, is used, thus Sb and Sw remain

in the following manner:

∑=

−−=K

k

T

kkkN1

B ))·(( mmmmS (4.21)

∑ ∑= ∈

−−=

K

k Cn

T

knkn

k

mymy1

w ))·((S (4.22)

From these expressions, a scalar is constructed that grows when the inter-class covariance is high,

and when the intra-class covariance is low, in a manner that we obtain a maximisation of the

separability of the classes. There are numerous approaches to obtain these properties, an example,

proposed by Fukunaga [FUKU_90].

B1 ·)( SSw−= wTrJ (4.23)

Class 1

Class 2

X1

X2

w


Page 93

Expressing )(wJ as a function of the projection matrix w defined by the direction vectors that

define the transformation, one would have:

( ) ( ) T

B

T

wTrJ wSwwSww ·····)(1−

= (4.24)

The transformation matrix w can be calculated by resolving the generalised problem of

eigenvalues and eigenvectors. The vector obtained in this manner maximises the expression

)(wJ obtaining a matrix w, which has K-1 non-orthogonal vectors that optimise the separability

between classes.

Through this method, one can extract features that efficiently discriminate between different

classes. On the contrary, one can only extract K-1 features, being K the number of classes.

Although these features may be optimal, if one wants to later use a linear classifier, they do not

extract additional features that could be discriminating when using other types of classifiers.

Another additional problem lies in the high number of samples necessary to correctly calculate

the intra and inter-class covariance matrices, Sw and Sb, respectively. Different from the PCA

case, it is necessary to estimate the covariance matrices for each of the classes, therefore, having

to have a high number of samples available for each of the existing classes.

The subspace calculated by these methods does not allow the linear separation of certain data

distributions, as for example, those whose shape is clearly nonlinear or those who have similar

means. Figure 4.5 shows cases in which the axes obtained through LDA do not allow an optimal

linear discrimination of the classes. A detailed description of the limitations of LDA in the

domain of hyperspectral imaging can be found in [PRAS_07],


Page 94

Fig. 4.5. Examples of LDA limitations

Other methods, such as the Projection Pursuit [FRIE_87] and the Independent Component

Analysis [COMO_94][BELL_95][DJOU_97] are appropriate for the feature extraction in non-

Gaussian data distributions. These techniques have been used in processes of blind extraction of

components.

Other methodologies extract nonlinear features from data. One of these methods that is directly

based on the Principal Component Analysis (PCA) is that known as Kernel PCA [HAYK_99]

[SCHO_98]. The basic idea of this method is to project the input data in a new space of features

F, modelled via a nonlinear function Ф (kernel), that is usually defined by a polynomial of order

p or by a Gaussian kernel. In this way, the eigenvector and the eigenvalues of the projection space

are calculated instead of those of the original space. The selection of the function that models the

kernel depends on the application and is still the object of study [JAIN_00].

Neural networks for one, integrate mechanisms for feature extraction and classification

[JAIN_00]. In fact, each of the outputs of the hidden layer of the network can be interpreted as a

set of new and, usually, nonlinear features. In this manner, multi-layer perceptrons can be used as

optimal feature extractors [LOWE_91]. The networks used in [FUKU_83] [CUN_89] are in fact

feature extraction filters in two-dimensional images, adjusted via the training of the network with

the existing data in order to maximise the classification.

The self-organising maps, or Kohonen maps [KOHO_95] can also be used for the extraction of

the nonlinear features. The manner of functioning of these networks implies the presentation of

Class 1

Class 2

X1

X2 Class 1

Class 2

X2

X1


Page 95

different patterns to it, in such a way that the network adapts to the patterns presented and

generates a set of characteristic nodes or vectors.

Through this learning, data can be categorised (clustered) and in this manner, one can

automatically generate data clusters that are capable of quantizing the input vectors. After

training, the weights of each of the network neurons tend to represent those input patterns that are

near in the original feature space.

2. Feature selection

One of the main problems of the transformation of features lies in that the techniques employed

for its obtainment transform them in such a way that they hide the physical meaning of the

different variables.

The techniques based on feature selection reduce the existing redundancy in the original data

without modifying the physical meaning of the variables. On the other hand, it reduces the

amount of data that is going to be used by the system, thus considerably reducing the

computational cost of the process, the acquisition time, as well as the cost of data storage or the

price of the sensor to be used.

By maintaining the physical meaning of the variables, the later extraction of knowledge from the

classifier is made possible. This converts the knowledge extracted by the classifier into useful

information for increasing the knowledge of the problem at hand or to generate, based on the

obtained results by the classifier, logical and comprehensible rules.

The techniques for feature selection can be divided into two groups: those which make the

selection by estimating the variables which bear greater advantages in the classification and those

which use previous knowledge to extract the most relevant features.


Page 96

2.1. Automatic feature selection.

These methods try to select those spectral bands which maximise previously defined criteria.

Generally, these criteria are based on either of these two premises:

− Increase in the classification rate.

− Obtaining of a greater separability between classes.

In this manner, this allows for the selection of that subset of variables that most efficiently

improves the separability between classes or that has the greatest increases in the rate of

classification. The obtaining of this optimal subgroup of features allows the later extraction of

information dealing with the importance of the extracted variables in relation to the problem to be

resolved. This infers new knowledge on the causes that can create a certain reaction.

Given the great quantity of existing variables, it is not feasible to search for that combination that

maximises the performance in the classification. For this reason, the comprehensive and

systematic search methods that analyse each and every one of the possible feature subgroups are

inappropriate for this task [GOME_02]. The search cost is enormous, even for a small number of

features [COVE_77].

Diverse search techniques are used in order to reduce the high computational cost that allow to

localise the subset of variables that optimises the chosen criterion through non-exhaustive search

techniques (exponential, sequential and random algorithms). However, the only alternative search

method that continues to be optimal and which eliminates the need to undertake a comprehensive

search is that known as branch and bound [NARE_77], always when the criterion function to be

maximised is increasing and monotonous, that is, it must always grow when a new feature is

added.

Other methodologies do not find the optimal subgroup, however, increase the calculation speed

of a subgroup, which is important in the applications with a great number of features [JAIN_97].

The fact that a feature, in an individual manner, obtains the best classification ratios [COVE_74]

does not mean that the optimal group of features has to include those features that obtain the best

individual results.


Page 97

For this, methods are needed that take into account the dependencies between the different

variables. Different algorithms are proposed for this:

− Forward selection: In a sequential manner, starting from an empty set, those features that

most increase the target function are added, having the limitation of not being able to

eliminate those variables that became obsolete after the addition of new sets of features.

− Backward selection: In a sequential manner, starting from the complete set of features,

progressively eliminating those that cause the biggest decrease in the target function. The

limitations of this method, analogous to the previous one, lie in the impossibility to return

to include those previously eliminated variables.

Other more sophisticated techniques include algorithmic improvements that combine both

aforementioned methods in order to eliminate their limitations. These methods add previously

eliminated variables again or to eliminate those variables that were previously added that do not

improve the current subgroup of features [PUDI_94].

− Selection plus l minus r: This technique sequentially adds l features in order to later

eliminate r features. Analogously, one can begin with the complete set of features and

eliminate l features in order to later add r. A drawback of this method is that the previous

calculation of r and l is necessary.

− Floating search methods: Extension of the earlier method that eliminates the limitation of

having to choose the values of r and l. This method, for each added feature, eliminates as

many features as necessary as long as this elimination improves the target function.

Similarly, starting with a whole set of features, as many features as necessary are added

for each feature eliminated so long as the classification improves.

As a general conclusion, one can state that this last method is almost as effective as the branch

and bound classification method while requiring a smaller computational cost [JAIN_00].

Irrespective of the chosen search criterion, one must quantitavely evaluate the improvement

caused by using the chosen subset of features.


Page 98

Observing criteria based on the improvement of the classification rate, a classifier is built with the

subset of chosen features. The effect of this reduction of features is evaluated in relation with

other subsets, in order to select the subset which obtains the better rate of correct answers.

The methods based on criteria of separability presuppose a statistical model in which the different

classes adjust to each of the feature subsets. Once these models are created, a calculation is made

of the statistical separation between the classes which compose it. As an advantage, these

methods have a smaller computational cost, but require a correct estimation of the distribution or

statistical model of the data in order to obtain efficient results.

2.2. Selection and extraction of known discriminant features.

Within this group, one can find those techniques based on the selection of known spectral

features, either caused by some type of known physical phenomenon or because they constitute a

discriminat feature and tabulated for certain types of materials (e.g. absorption bands of C-H

bond, bands of chlorophyll absorption...).

The specific absorption in a band (or neighbourhood of bands, where the absorption is produced

in several consecutive bands), can be caused by the presence of certain chemical elements, ions,

their ionic charges, or partly, by the crystalline structure of the components.

The algorithms shown in this section use the information of these absorption phenomena and their

causes to directly extract those characteristics that are most suitable to solve a specific problem.

It should be noted that these methods do not correspond with a pure feature selection, as these are

not directly selected, but rather are extracted using previous empirical knowledge that entail a

previous transformation of the spectrum for its obtaining.

There are different approaches to these methods, some of them reduce the spectrum to a certain

number of features, every one representing each of the chemical components present in the

spectra to be analysed [WIN_07]. In this manner, a spectral image that contains elements made

up of a mixture of three pure polymers shall be reduced to three components, each containing the

signature of each of the polymers.


Page 99

When no information is available on the pure variables, then information is extracted from the

second derivative of the reflectance spectrum. This provides information on the different

absorptions that are produced throughout the spectrum, obtaining the absorption signatures

associated to it.

Different absorption spectra from different materials have been tabulated in different data bases

[ASTER_98] [CLARK_93] that include the model reflectance spectra for different materials in a

pure chemical state. In this manner, information is available on the absorption bands of the

different chemical components that are of interest when classifying a material.

Fig. 4.6. Characterisation of an absorption band

Though this approach requires detailed knowledge of the elements to be analysed, it makes

possible the selection of those features which are most suited in order to resolve a specific

problema [CLARK_95]. In this manner, one can manually select those features which best

discriminate between the elements to be classified, bearing in mind the previous existing

knowledge. Knowing the characterisation of these absorptions through diverse parameters

(wavelength, absorption intensity, thickness of absorption), one can detect its presence in the

spectrum and use that descriptive characteristic of the material to be classified.

However, although these methodologies do not destroy, but rather foster the physical

interpretation of the spectral features, they require previous knowledge of the spectra to be

analysed. Therefore, they can not be used as blind classifiers that establish which are the

d

h

λ

Reflectivity

Wavelength


Page 100

important characteristics or as classifiers which can be used to extract a later knowledge based on

additional information provided by the classifier.

3. Conclusions

The use of statistical classifiers or other types of classifiers to model the classes defined by these

spectra can not be undertaken directly due to both the great quantity of training data necessary

and the high complexity of the necessary classifier because of the high dimensionality of the

input data.

This chapter has mentioned classical methods of feature extraction such as those based on the

Principal Component Analysis and the Linear Discriminant Analysis, observing some of their

limitations and stressing that this methods cast a shadow on the information contained in the

transformed variables, not allowing for an easy analysis of the obtained results.

The advantages of the methods of automatic feature selection have been highlighted, in so far as

they select those variables which improve the classification and discrimination between classes

without casting a shadow on the physical meaning of the extracted features, also mentioning the

limitation that one can only select each of the bands separately.

Lastly, the advantages of the methods based on the previous knowledge of the spectrum have

been shown, in order to extract and model certain previously known absorption features.

However, these methods have the disadvantage of requiring a previous knowledge of the

materials to be classified, something not always available.

It is therefore necessary to have a feature extraction/selection method which does not blur the

physical meaning of the extracted variables and which automatically models and selects the

absorption features that are present in the spectra, thus allowing for its correct classification.

Page 101

Chapter V

Extraction of spectral features based on fuzzy sets

bioinspired in the human visual system

Chapter V Extraction of spectral features based on fuzzy sets

Page 102

Earlier chapters have described the problem of the classification of luminous spectra that have a

high dimensionality. The processes of feature reduction are necessary in order to minimise these

problems related to the high complexity of the classifier and with the not always sufficient

number of training elements (Hughes Phenomenon).

For the case of hyperspectral images, feature extraction techniques efficiently extract, from a set

of data, new features that have a high discriminating power. However, these methods require a

previous training and the extracted discriminating features are dependent on the set of training

data.

Features that are extracted in this manner, despite their discriminating power, do not represent

specific physical variables and therefore can not be easily interpreted. Furthermore, these

features, as they are dependent on the set of training data, have variations when an addition of

any new element or class is made to the sample. This effect makes these variables not adequate

for cases in which an addition of new classes is going to be made or where these are going to

progressively change over time.

On the other hand, an automatic selection of features allows the selection of those variables which

provide best results in the classification without casting a shadow on the physical meaning of the

variables used. On the contrary, this methodology is also dependent on the classes to be modelled

and can eliminate variables, which although not having the discriminating power for the studied

classes, are crucial in the discrimination when new classes are added.

Methods based on the previous knowledge of the classes to be analysed can localise and quantify

previously known absorption bands that identify those classes. However, these methods bear the

same disadvantages as the earlier ones, as they require the previous knowledge of the classes to

be modelled. Furthermore, the incorporation of new classes to the system creates the possibility

that those detected and modelled absorption bands may not be suitable for their discrimination.

Given these limitations, it is necessary to choose a set of characteristics that comply with the

following conditions:

− That they reduce the original feature space.

− That they have an adequate discriminating power.


Page 103

− That they do not depend on a previous training, so that they do not vary in their definition

when adding new classes to the system or when modifying the existing classes.

− That their discriminating power is based on a physical basis.

− That they maintain the physical meaning of the variables.

− That they include the advantages of the methods based on the localisation of absorption

bands without requiring a previous knowledge of the classes to be modelled.

− That they be generic, without depending on either the type of application or the set of

training.

In this chapter, a definition is provided and a proposal is made for the adaptation of the theory of

fuzzy sets as proposed by Zadeh in the formulation of his theory of fuzzy logic [ZADE_65] for

the extraction of discriminating features in hyperspectral pixels.

This proposed adaptation has a behaviour similar to the feature extraction process made by the

human eye. These defined fuzzy sets would correspond to a "virtual cone", sensitive to a specific

area of the luminous spectrum in a manner similar to the human eye, taking advantage of the

existing correlation between near spectral bands.

1. Definition of fuzzy sets

The fuzzy sets defined by Zadeh [ZADE_65] offers an extension of the classical definition of a

set. In the classical definition of a set, an element can either belong or not belong to a specific set.

In the extension of the fuzzy theory of sets, the concept of grade of membership to a specific set

is added. In this manner, the issue is not whether an element either belongs or does not belong to

a set, but rather expresses the grade of membership to a set in the interval [0,1], the unit

indicating full membership to the set and the zero value its non-membership.

In this manner, one can define fuzzy features that are capable of expressing concepts which are

not correctly expressed through the classical definition of sets, such as "slow", "agile", "quick",

"young"...


Page 104

Fig. 5.1. Separation of the different stages of life through classical sets.

If one needs to express concepts such as child, adult or elder based on age, and these are defined

through classical sets, as shown in figure 5.1, one notices that there are no differences between

elements of the same group, classifying the month-old newborn in the same manner as an

eighteen-year old. Additionally, it considers a person of seventy-five to have the same elderly age

as that of ninety-five year old. Likewise, elements situated in the border between two groups

could change their membership in a brusque manner without changing their defining features (in

this case, age) in a considerable manner.

However, the use of fuzzy sets to define these concepts takes into account the grades of

membership of each of the sets. In this manner, figure 5.2 shows the grade of membership

through triangular functions. These membership functions represent the grade of membership to

each of the sets. In this manner, a ninety-year old person shall have a greater grade of

membership to the concept "elder" than one of sixty years.

Fig. 5.2. Separation of the different life stages through fuzzy sets.

Age

Grade of membership

Child

Adult

Elder

Child Adult Elder Age

Age

Grade of membership

Child

Adult

Elder

Child Adult Elder Age


Page 105

In the same manner, one can define membership functions with different geometries, so as to

represent the desired concept in an adequate manner. Examples of these types of functions are

Gaussian, trapezoidal...

2. Spectral fuzzy sets

The theory of fuzzy sets can be applied to the extraction of discriminating features of

hyperspectral pixels. Based on the same philosophy as fuzzy sets, one can divide a spectrum into

a specific number of sets so that each of these sets represents a specific range of the spectrum.

Assuming that the elements to be measured are not in an excited gaseous state, which would

cause the emission or absorption in very specific bands of the spectrum, one can state that there is

a high correlation between adjacent bands. This correlation entails that the absorption bands that

define the materials are not going to be solely represented by a single band, but rather they are

going to be defined by a set of bands which are adjacent between them.

In order to model this correlation, one must achieve the association of each of the wavelengths of

the spectrum with those of neighbour wavelengths in order to take into account the values of the

adjacent frequencies. A way to do this is to associate each of the elements of the spectrum to the

diverse ranges in which the spectrum can be divided. In order to avoid that close elements could

belong to two different groups, a proposal is made so that the different sets are defined in a fuzzy

manner, as described in the following.

Where L is the vectorial representation of a spectrum defined by the intensity response of M

wavelengths, as shown in figure 5.3.

TMLLL ,...,, 21=L (5.1)


Page 106

Fig. 5.3. Graphical representation of a hyperspectral pixel based on its wavelength.

The different ranges of this spectrum are represented by K-fuzzy sets. In this way, each

wavelength of the spectrum has a specific grade of membership to one or several fuzzy sets.

Fig. 5.4. Representation of the different fuzzy sets into which the spectrum is divided.

Figure 5.4 shows the division of the spectrum into K-fuzzy sets, showing the grade of

membership of each of them through their respective K-triangular functions of membership.

Although for simplicity's sake one assumes triangular functions of membership, these can have

other types of shapes, though the triangular and Gaussian are the most common. However, in a

generic manner, a spectrum can be represented via any type of shape.

Observing the sensitivity of the cones of the human visual system (see Chapter II), one notices

that this sensitivity can be approximated through the definition of Gaussian or triangular fuzzy

sets for the appropriate wavelengths, as shown in figure 5.5. In this manner, using three fuzzy sets

sensible to red, green and blue, one can emulate the way in which the human eye sees.

λ

Grade of membership

1

0

Mf1 Mf2 Mf3 … … … MfK-1 MfK

λ

Intensity

λi

Li


Page 107

Fig. 5.5. Absorption frequencies of the different types of cones present in the human visual

system and their comparison with the sensitivity of different triangular fuzzy sets.

The division of the spectrum into equally spaced triangular fuzzy sets, as shown in figure 5.6,

generically associates a wavelength with a specific region of the spectrum. In this manner, a sort

of multi-spectral eye is generated in which each defined fuzzy set has a similar sensitivity to that

which a human ocular cone sensitive to those wavelengths would have.

Fig. 5.6. Definition of fuzzy sets based on triangular shapes.

Figure 5.6 shows the division of a spectrum into K-fuzzy sets defined by triangular membership

functions Mfi. These functions are triangularly shaped in the proximity of its central wave λCi and

have null value in other areas of the spectrum as defined in equation (5.2). The length of the

central wave is λCi and D is the distance between two consecutive central frequencies.

S Cone

M Cone L Cone

Wavelength (nm)

400 500 600 700

Grade of membership

λ

Mf1 Mf2 Mf3 … … … MfK-1 MfK

λC1 λC2 λC3 … … … λC(K-1) λCK


Page 108

+<<−−

−=

Otherwise,0

λλλ,1)(

DDDMf CiCi

Ci

i

λλλ (5.2)

This way, each point in the spectrum has a grade of membership associated to each of the fuzzy

sets in which the spectrum is divided. In this manner, each λi element has a grade of membership

for each set, which is defined by the value of each Mfi membership function that is associated

with that set at that point.

By defining the membership functions in this manner, the value of intensity L(λi), which is

associated to wavelength λi, has a membership associated to it different from zero for the two

adjacent fuzzy sets and a grade of membership equal to zero in the rest, as shown in figure 5.7.

This way, one can establish that a certain wavelength belongs to each of the fuzzy sets of the

spectrum with a certain rank.

Fig. 5.7. Membership of the wavelength λi to the different fuzzy sets.

Depending on their size and position, the fuzzy sets so described define different parts of the

spectrum. Broad fuzzy sets (high D) can define the following spectral ranges: "visible", "near-

infrared", "mid-infrared",... while sets that have a smaller D parameter value define other more

specific ranges, such as: "oranges", "yellows", "violets",...depending on the wavelengths that they

λ

Grade of membership

λi λ

λi

Li

Intensity

Mf1 Mf2 Mf3 Mf4 Mf5 … MfK-1 MfK


Page 109

represent. When having available a hyperspectral pixel in the visible range, the selection of three

fuzzy sets to define it would be the equivalent of taking its RGB image.

Given the high correlation between adjacent wavelengths, one can represent the original spectrum

and describe it based on the behaviour of the different fuzzy sets.

In order to model this behaviour, one defines the Energy of each of the fuzzy sets in which the

spectrum is divided, weighting the intensity of each of the elements of L(λi) spectrum with the

membership function associated to each of the sets. This way, the Energy of each fuzzy set for a

given spectrum shall be defined by:

∫=

=

=M

ii dLMf

λ

λ

λλλ0

)·()·(E (5.3)

This Energy so defined, indicates the intensity of the spectrum in that fuzzy set in the same

manner as perceived by a human cone. Depending on its position and size, one can measure

concepts such as the intensity in the "visible", "near- ultraviolet" or the intensity in hues such as

"red", "violet"...

Earlier methodologies [CLARK_95] manually compiled and parametrised the absorptions that

allowed making a discrimination between the different materials (wavelength, intensity of

absorption, width of absorption) that were later tabulated into data bases [CLARK_93],

[ASTER_98]. These absorptions are related to the chemical composition of the material as they

respond to the frequency of vibration of its chemical bonds. These are considered as one of the

fundamental elements in the discrimination between different materials [WIN_07].

One of the main advantages of the proposed method over earlier ones lies in the capacity to

capture relevant information on the absorption bands that characterise a material in an automatic

manner and without the need to have previous knowledge about it.

The fact that the absorption phenomenon happens in consecutive wavelengths makes the analysis

of the Energy in the adequate fuzzy set provides us with information that a certain phenomenon

of absorption has happened. At the same time, it provides us with information as to its position


Page 110

(fuzzy set in which it happens) as well as the intensity of the absorption (area of absorption).This

Energy integrates the parameters of the absorption phenomenon within it by combining the levels

of absorption in each of its near wavelengths while considerably reducing the noise of the

parameters.

In this manner, the absorptions that are present in the spectrum are directly parametrised by the

Energy of the associated fuzzy set or sets and the Energy of the adjacent fuzzy sets.

Fig. 5.8. Calculation of the Energy of each of the fuzzy sets for a given spectrum.

This way, a hyperspectral pixel can be represented as the vector which contains the values of the

Energy of each of the fuzzy sets in which it is divided, thus reducing the space of features using K

elements, K being the number of fuzzy sets used.

TKTrans EEE ,...,, 21=L (5.4)

The representation of the spectrum based on the Energy of the different associated fuzzy spectra

characterises these absorptions in a more efficient manner than the earlier parametrising, as the

presence of a certain absorption is associated with the values of the Energy of the fuzzy sets. This

λ

Energy

Intensity

Ei E2 E3 … … … EK-1 EK

x x x x x x x

x


Page 111

creates a unique and universal feature vector that is capable of defining the absorptions present in

different materials without a need for either previous training or its modification. To a certain

extent, this approach is similar to that used by the cones of the human visual system by

associating certain wavelengths with a specific visual sensation.

The concept of the Energy of a certain range of the spectrum keeps the physical meaning of the

variables as it represents the answer of the spectrum in each of the different fuzzy sets that, in

turn, represent conceptual ranges of it. In this manner, a high Energy level for a fuzzy set in the

wavelength range of the colour red corresponds with the perception that the human type L cones

make of red.

On the other hand this methodology reduces the dimensionality of the feature space in an

efficient manner due to its basis on the physical properties of spectra, making use of its

discriminating capacity of the absorption bands and their existing correlation. In order to show

these properties, figure 5.9 shows, on the one hand, the raw representation of the spectra A1, A2,

B and C, while on the other, its representation based on fuzzy sets. Of these, spectra A1, A2

belong to a same class while spectrum B represents a different type of class which is

differentiated from class A. Spectrum C represents a spectrum which belongs to a class of

spectral properties similar to B.

Fig. 5.9. Spectral representation of materials A1, A2, B and C.

a) Raw representation, b) Representation based on fuzzy sets.

Wavelength

Intensity

A1

A2

C

B

Component

Intensity

A1

A2

C

B


Page 112

When observing (Table V.1) the differences obtained when comparing the representation based

on fuzzy sets of spectrum A1 with the representation of the rest of spectra through classic metrics

defined in equations (3.2), (3.3), (3.4), (3.6) and (3.9), one can notice that an increase in the

distance of separation between the different membership spectra in respect to the distance

obtained for spectra which belong to a same class.

In order to analyse whether we have an increase in the separability associated with the use of the

spectral representation based on fuzzy sets, a comparison of the distances of separation obtained

from the fuzzy representation (Table V.1) is made with the distances obtained in the raw

representation of the spectrum (Table III.1).

The results from this comparison are listed in table V.2 and show the increase in the separability

caused by the use of the fuzzy spectral representation, except for the SAM metric. This allows us

to state that, a priori, a positive effect in the separability of materials is caused by the use of this

bioinspired technique.

TABLE V.2 INCREASE OF THE SEPARABILITY IN THE REPRESENTATION BASED ON FUZYY SETS VERSUS THE SEPARABILITY

BASED ON RAW SPECTRUM

Raw Separability Separability based on

fuzzy sets Improvement in the

separability

City Block 144.40% 229.15% 1.59 Euclidean Distance 135.37% 229.70% 1.70 Tchebychev Distance 115.02% 212.57% 1.85

SAM 100.15% 100.34% 1.00

SID 182.83% 538.12% 2.94

TABLE V.1 COMPARSION OF THE DISTANCES OBTAINED FOR SPECTRUM A1 BASED ON CLASSICAL METRICS FROM THEIR

CHARACTERISATION BASED ON FUZZY SETS

A2 B C

City Block 0.10 0.48 0.22 Euclidean Distance 0.04 0.20 0.09 Tchebychev Distance 0.03 0.14 0.06

SAM 1.00 0.98 1.00

SID 0.00 0.14 0.03


Page 113

3. Multi-frequency spectral fuzzy sets

The representation of the spectrum based on fuzzy sets obtains information on the behaviour of

the spectrum in each of the ranges represented in these sets.

In most cases, the division of the spectrum in an adequate number of fuzzy sets will efficiently

reduce features. It adequately models the absorption bands present in the elements to be classified

and maintains a rate of classification that is similar to that obtained through Principal Component

Analysis (PCA), but without its disadvantages. This shall be shown in the chapter on results.

Other times, the case might be that the defined fuzzy sets are excessively large in order to model

small absorption bands that might be present in these materials.

In these cases, a proposal is made for an extension of the earlier method based in the multi-

frequency definition of fuzzy sets that define each of the ranges of the spectrum. For this, as in

the earlier method, triangular functions of membership are created. Unlike the previous, in this

case, N collections are created, each composed by a set of triangular membership functions

defined by a Dj spacing parameter that is associated to each of the collections.

In this manner, a membership function Mfij that is associated with a central length λi and a

spacing parameter Dj is provided by the following expression.

+<<−

−−

=

Otherwise,0

λλλ,1),( jCijjCij

j

Cij

ij

DDDMf

λλλ

(5.5)

This way, by defining the different values of Dj, one can create different collections of fuzzy sets

as shown in figure 5.10.


Page 114

Fig. 5.10 Different collections of fuzzy sets with different Dj spacing parameter.

Applying (5.3) to each of the collections, one extracts the Energy of each of the fuzzy sets present

in them. In this way, collections with a greater spacing parameter will capture absorptions of a

lower frequency while those collections with a spacing parameter that is smaller will capture

absorptions that bear a higher frequency, simultaneously capturing information related to

different frequencies.

The feature vector associated to each of the j collections is provided by:

TjKjjDTrans jjEEE ,...,, 21, =L (5.6)

Where Dj is the spacing parameter associated to that collection.

Combining the feature vectors associated to all the existing collections, one obtains the final

feature vector:

λ

Collection D1

Intensity

λ

λ

Collection D2

Colletion DN


Page 115

;...; ,,, 11 NDTransDTransDTransTrans LLLL = (5.7)

This approach gathers information on the different absorptions and behaviours present in the

spectrum in different frequencies, in such a way that the behaviour of each of the ranges of the

spectrum are modelled in a more precise manner. In this manner, one can get more detailed

information on the type of absorption. However, this approach considerably increases the size of

the feature vector if one wants to add information of very high frequency.

Likewise, the definition of a sole collection of fuzzy sets with a separation parameter D,

sufficiently small to collect the desired frequency information, causes the information contained

in collections of smaller frequency (greater D) to be easily inferred by the classifier only using

this collection and making the pyramidal and multi-frequency approach contain redundant

information.

4. Conclusions

This chapter has described the application of the theory of fuzzy sets to make an optimal

modelling of a hyperspectral vector. This becomes an efficient method that reaches the desired

requirements for the set of features that must be described in that spectrum, given the

characteristics of high correlation between adjacent bands, the high dimensionality of the data and

the discriminating power of the absorption bands, the separation of the spectrum to be analysed in

different fuzzy sets and the extraction of its features based on the Energy of each of them.

The use of fuzzy sets for the separation of the spectrum in different ranges takes advantage of the

high correlation between adjacent bands at the same time that it avoids the sharp separation

between ranges that would cause a separation based on the classical definition of sets. The

sensitivity of each of the fuzzy sets that are defined for a spectrum is similar to that obtained by a

cone of the human visual system. This way, the features extracted through this methodology

would correspond to those acquired by a multi-spectral eye with a number of types of cones that

is the same as the number of defined fuzzy sets.


Page 116

The representation of the spectrum based on the Energy of different fuzzy sets allows the efficient

and practical modelling of the parameters that define the absorptions present in that range of the

spectrum, allowing for the creation of a universal feature vector.

This approach reduces the size of the information stored in the spectrum, as it intrinsically stores

those features necessary for the classification. Therefore, the computational cost necessary for its

calculation is much smaller than other methods of extraction/reduction of features. Given its

universality and that fact that it does not have variations in its extraction, it can be implemented in

hardware platforms, thus speeding the extraction process in very high resolution spectra.

Different from what happens with other methods of feature extraction, such as PCA or LDA, the

features extracted by this method do not lose their physical meaning. The concept of Energy that

is associated to each of the sets in which the spectrum is divided into represents the degree of

reflectivity in that set of the spectrum. This is equivalent to the information extracted by the

human visual system.

Likewise, it also does not require of a previous training in order to extract fuzzy sets that are

going to define a material, as can happen with PCA or LDA, in which the addition of a new

element in the classification varies the extracted variables.

The multi-frequency approach could offer promising results. However, the problem is that it adds

redundant information to the extracted features which can needlessly increase the size of the

feature vector.

On the other hand, the Hughes Phenomenon [HUGH_68] effect is reduced because of the

reduction of the feature space in an efficient manner. This is due to the fact that the proposed

method imitates the human eye when trying to extract the existing discriminatory properties in the

different absorption bands and the correlation between adjacent bands. In this manner, the

number of necessary training elements is reduced and the complexity of the classifier is

simplified.

The efficient reduction of features that is achieved by this method reduces the representative size

of a spectrum. This reduction permits the extraction of information on the spatial distribution of


Page 117

different spectra, creating feature vectors that can be tackled in terms of the Hughes Phenomenon

and the complexity of the classifier.


Page 118

Page 119

Chapter VI

Integration of spectral and spatial features

Chapter VI Integration of spectral and spatial features

Page 120

Earlier chapters described different approaches that allow the feature extraction which make

possible the definition of the spectral properties of materials. In this manner, these features can be

used to discriminate different materials. At times, due to diverse causes (cost of spectrometers,

acquisition speed, spectral similarity of the objects to be analysed...), this information may not be

enough in order to obtain a classification with an adequate discriminatory capacity.

However, the fact that two objects can have similar spectral features (colour) does not entail that

one has the same visual perception of them, as one can observe elements of a similar colour, such

as a piece of tree bark and an object of clay, and perceive that they are different. One notices that

the distribution of those colours presents a different pattern in both objects. This spatial

distribution of the different chromatic tonalities of an object is known as texture.

The discriminatory power of the spatial distribution makes it appropriate to include this

information in the model of the material in order to increase the discriminating power when

dealing with different materials.

In order to illustrate the manner in which the inclusion of this spatial information increases the

discriminating power between different classes, figure 6.1 shows the statistical distribution of two

spectral variables that represent two different classes, class 1 and class 2. As can be seen in this

figure, point P represented within it, has a similar membership probability to class 1 or class 2. In

this manner, only taking into account the information contained in the spectral variables of the

point, which point can not be assigned to one class or another with the desired certainty.

Fig. 6.1. Statistical distribution of the elements of two classes with an area of overlapping.

Green

Red

V1

R1

Class 1

P

Class 2


Page 121

However, through the study of the spectral features of the points situated in the spatial

neighbourhood of point P, one can estimate the statistical distribution caused by its features

associated to the points in that neighbourhood. This way, in figure 6.2, X marks the spectral

features of the points that belong to the P neighbourhood. In figure 6.2, the analysis of this

statistical distribution assigns point P to class 1.

Fig. 6.2. Classification of a set of elements based on its statistical distribution.

Bearing in mind the earlier example, the observations of the statistical distribution of the spectral

features of a set of spatially near pixels allows a more precise characterisation of the class to

which the central pixel belongs. Let us suppose that one has to classify an element associated to

pixel P as belonging to either a light or a dark element. The example in figure 6.3 shows the

impossibility to undertake a precise classification through the use of the value of the features

which define a sole pixel. However, through the analysis of the statistical distribution of near

points, one can infer that the object to which they belong is predominantly dark and make its

more precise characterisation.

Green

Red

V1

R1

Class 1

X

X

X X

X X

P

Class 2


Page 122

Fig. 6.3. Use of spatial distribution of features for a classification. Above) Use of the single

element of the pixel, Below) Use of spatial features in the classification.

In this chapter, a proposal is made for a new methodology to create a descriptor which includes

the spatial information within a spectral model that defines each element of the image. In order to

do so, the use of neighbourhood histograms that compile the statistical distribution of each of the

extracted spectral features is proposed, integrating this information in a sole feature vector.

The proposed discriminating features not only will increase the separability between different

materials through the integration of spectral-spatial information, but also, due to this increase in

separability, will allow the use of simpler classifiers.

Specifically, the use of fuzzy neighbourhood histograms is suggested in order to model the spatial

properties of the discriminating features of the materials. First, an explanation is provided on the

need to use of the method of fuzzy discretisation for the estimate of the different groups that

make up the histogram. This fuzzy discretisation avoids erratic changes in the created histogram

due to small variations in the intensity of the variables. Second, a theoretical description is

provided on the implementation of the concept of the fuzzy neighbourhood histogram. This

concept is broadened throughout the chapter for its application on spectral and vectorial images.

Analogously, the concept of fuzzy region histogram is created thus replacing the neighbourhood

histogram for the calculation of the histogram based on the concept of a range.

Last, a brief listing is provided of the contributions made in this chapter on the integration of

spectral and spatial features.

Classifier

Classifier

P

P


Page 123

1. Fuzzy spatial histograms

In order to include spatial information within the model, we propose the use of spatial histograms

that capture the existing variability in a specific spatial region of an image.

By definition, a histogram represents the frequency with which a studied variable is found within

each of the previously defined intervals, providing information on the statistical distribution that

the variable follows in the sample. Chapter IV, section 4.3.1, details the use of the histogram

calculation for the estimate of the probability functions.

First, the variable that is to be represented is quantized in a number of intervals that define the

possible range of values for that variable. For each element of the sample space, a calculation is

made of the interval to which it belongs and the number of elements that it contains is increased.

In this manner, each of the intervals of the histogram stores the number of times that the value of

an element of the sample space has fit within that interval.

This way, the histogram calculated in such manner is a reflection of the probability density

function that the analysed variable follows. Therefore, the variables that define this histogram are,

in turn, a reflection of that probability function. This way, the study of these variables

corresponds with the study of the probability distribution function of the elements in the sample

space that have generated the histogram.

Fig. 6.4 Histograms associated to different types of distributions of the values of variable X. a)

Constant distribution, b) Slightly darkened constant distribution c) Chess distribution.

),( yxX

H

Hx = [0, 0, 0, 15, 0, 0 ] Hx = [0, 0, 15, 0, 0, 0 ] Hx = [7, 0, 0, 8, 0, 0 ]


Page 124

Figure 6.4 shows the spatial histogram calculated for those images in the higher part. For its

calculation, the gradation of possible intensities has been quantized and in this manner, each

intensity is associated with one of the groups in which the histogram is divided. Once quantized,

the number of elements belonging to each of the groups is calculated, thus obtaining a histogram.

1.1. Improvement in the quantization of the histogram.

Figure 6.4 shows the division of the range of values that a variable X can have in a set of intervals

that are equally spaced in a normal manner. This associates each of the possible values of the

variable to a specific membership group.

In this way, this histogram is mathematically defined by a vector whose number of components is

equal to the number of divisions of the range of values and each component of the vector

containing the number of elements that belong to that range (6.1).

[ ]MX

NNNyx ,...,,),( 21=H (6.1)

Where Ni is the number of elements that belong to the i interval.

However, this way of quantizing groups creates discontinuities in the borders of each of the

intervals. The elements situated in these limits can jump from one group to another, thus making

the shape of the histogram vary due to small variations caused by noise. This is shown in figure

6.4a and 6.4b, in which a slight change in intensity can generate a totally different histogram

descriptor (vector H(x, y)). Additionally, this quantization does not reflect, in a steadfast manner,

the grade of membership of the variable to each of the groups in which the histogram is divided.

Because of this, and in order to overcome these limitations, a proposal is made to study the

discretisation and quantization of the variable to be studied in fuzzy sets. In this manner, each

element of the sample space of variable X contributes to each interval of the histogram in

accordance with its grade of membership. Figure 6.5 shows these two types of discretisation.

Figure 6.5a shows the classical discretisation process where each element of the sample space is

absolutely and exclusively associated with its membership in a sole group, while figure 6.5b


Page 125

shows the fuzzy discretisation of the variable where a different grade of membership is associated

to each of the groups.

Fig. 6.5. Quantization of variables a) Classical quantization b) Fuzzy quantization.

Using as a basis the previously mentioned approaches, let us consider a set of elements

represented by a feature X, where X is a continuous variable with values in the interval [0, A],

where high values represent a greater intensity and low values represent a lower intensity. By

analogy with the images represented in grey levels, the high values of variable X are represented

with light hues and low values with dark hues.

As previously mentioned, and in order to proceed to the histogram generation, this X feature is

discretised through the division of the range [0, A] into several intervals.

Performing this discretisation through a classical approach, each value of X is exclusively

assigned to a unique group. Mathematically expressed, a rectangular membership function for

each of the discretised intervals can be defined in such a way that this function for the X values

that belong to this set have a unit value and a zero value for those that do not belong, as shown in

figure 6.6.

Fig. 6.6. Histogram membership function )(xHMfi .

Child

Adult

Elder

Age

Child

Adult

Elder

Age

0

1

X

)(xHMf i


Page 126

This function which defines the membership of variable X within the i group, is provided by the

following rectangular function:

+<<−=

Otherwise,022

,1)(D

XXD

XxHMf ii centralcentral

i (6.2)

Where icentralX is the central value of each of the groups and D the width of each of the groups as

defined by D = A / N, A being the highest considered limit and N the number of groups in which

the variable has been quantized. The function )(xHMfi defined in (6.2), shall be named:

Histogram membership function

An effective way to quantize this useful variable for the calculation of the histogram is through its

representation via a vector Qx of the same number of components as the number of groups in

which the feature (N) has been divided. Each component of this vector shall be defined by the

value of the function )(xHMfi which defines the grade of membership for each of the groups of

the specific value of X.

)](),...,(),([)( 21 xHMfxHMfxHMfx Nx =Q (6.3)

Fig.6.7. Classic quantization of a variable.

Quantization of a variable

1

0

0,5

X

Quantization of a variable

1

0

0,5

X

Qx = [0, 0, 0, 1, 0, 0] Qx = [0, 1, 0, 0, 0, 0]


Page 127

Figure 6.7 shows two examples of quantization of variables through this method as well as the

associated quantization vector Qx.

However, this function )(xHMfi can be generalised in such a way that the membership to each of

the groups is not defined by an all or nothing of classical discretisation. The definition of the

membership functions of each of the intervals can be done through the use of triangular functions

(6.4), which implement the aforementioned fuzzy discretisation.

+<<−

−−

=

Otherwise,022

,21)(

DXX

DX

D

XX

xHMf ii

i

centralcentral

central

i (6.4)

These triangular membership functions, shown in figure 6.8, associate each value of the feature X

to a specific grade of membership for each of the groups of the histogram. In this manner, the

membership of a certain value of X is not exclusive to a sole group but rather the value of

membership is shared between several groups.

Fig. 6.8. Fuzzy quantization of a variable

This way, by substituting the membership function (6.4) in equation (6.3), one can obtain the

quantization vector associated to fuzzy sets in which the X variable has been quantized. In this

way, a specific value of X shall be quantized by a vector Qx that is established by the values of

the different membership functions in the different sets, as shown in figure 6.8.

Histogram membership functions

Quantization of the variable

X

1

0

0,5

X

Qx = [0, 0, 0, 0.8, 0.2, 0] )(xHMf i


Page 128

In this manner, through the definition of the vector Qx, one can provide a robust mathematical

representation for the discretisation of the histogram which allows the undertaking of its classical

quantization or its fuzzy quantization through the use of different membership functions. This

quantized vector Qx, in any of its forms, is going to serve as the basis for the mathematical

definition of the histogram.

1.2. Definition of the fuzzy neighbourhood histogram.

Once the variable X is quantized via the quantization vector Qx, one can create a histogram that

characterises the distribution of the different intensities of variable X. The final objective of the

creation of this histogram is the estimate of the probability density function of the spatial

distribution of feature X in the surroundings of that point.

For this, let I be a two-dimensional image containing values of a variable X in each of its pixels,

the value of X is defined by I(x, y) for a point P(x, y) of the image. In the same manner, one can

consider Qx(x, y) the vector that represents the quantization of variable X in each point (x,y) as

defined in equation (6.3).

Figure 6.9 shows an example of an image I(x, y) as well as its different quantizations, using the

triangular membership function described in equation (6.4).

Fig. 6.9. Example of the quantization of different points of image I(x, y).

),( yxxQ

),( yxxQ

),( yxxQ

),( yxxQ

x


Page 129

In order to include in the model the statistical information of the spatial distribution of variable X

in a neighbourhood of the point P(x,y), we define an associated neighbourhood for each point

P(x,y) to be analysed and we calculate the histogram defined by variable X in that

neighbourhood.

In order to do so, we define a neighbourhood centred on P(x,y) of previously established

dimensions [Width, Height]. The use of large-sized neighbourhoods produces a better estimate of

the spatial distribution of the variable. On the other hand, they require a greater calculation time

at the same time that they increase the possibility that neighbourhood has more than one class of

material.

The representation employed for the quantization vector Qx uses a direct calculation of the

histogram of the neighbourhood associated to a point through the sum of each of the vectors Qx

associated to each of the points (i,j) that belong to the neighbourhood P(x,y), as defined in

function (6.5).

),(),( jiyxX

By

Byj

Ax

AxiX

QH ∑∑+

−=

+

−== (6.5)

Where A and B determine the size of the neighbourhood to be analysed.

Please note that in the case of a classical discretisation (rectangular membership function

(equation. (6.2))), the histogram vector obtained through equation (6.3) corresponds with the

vector that would have been obtained through the sum of the number of elements belonging to

each of the intervals that make up the histogram.

Figure 6.10 graphically shows the calculation of the function HX(x, y) for a neighbourhood of size

3 x 3. One can notice that the calculation of the function of the histogram HX(x,y) is done in a

direct manner via the sum of the quantization vectors. This vector HX(x, y) represents the level of

presence of the different intensities that variable X can have in the neighbourhood of point P(x,y).


Page 130

Fig. 6.10. Graphical representation of the calculation of the fuzzy neighbourhood histogram

In order to really obtain an estimation of the probability function of the intensities of X in the

neighbourhood, the sum of each of the elements of the vector HX(x,y) that define the histogram

must equal the unit. The normalised neighbourhood histogram is obtained by dividing this vector

by its associated L1 norm, as described in (6.6).

1),(

),(),(ˆ

yx

yxyx

X

X

X H

HH = (6.6)

Where

∑∀

=i

XX iyxyx ))(,(),(1

HH (6.7)

),(ˆ yxX

H represents the estimate of the probability density function of the behaviour of the

variable X in the described neighbourhood, which captures the spatial behaviour of this variable

in its neighbourhood.

1.3. Definition of the fuzzy region histogram

The earlier definition estimates the probability density function of a variable in a neighbourhood

of a point. However, this concept can be generalised to any region of the image. This fact

generates a histogram which captures the spatial information of the variable in a specific region

of the image.

),( yxxQy

x

),( yxX

H


Page 131

An R region of the image is defined as a set of points belonging to an image, which are normally

connected and which represent a specific area of it. An example of a region is shown in figure

6.11.

Fig. 6.11. Representation of a region of an image.

The same as for a neighbourhood, one can calculate the histogram of variable X for a region R

from a quantized vector Qx through the sum of the quantization vector of each of the points

belonging to that region:

∑ℜ∈∆

=ℜ),(

),()(ji

XiXjiQH (6.8)

Analogously to the earlier case, one can proceed to its normalisation:

1)(

)()(ˆ

iX

iX

iX ℜ

ℜ=ℜH

HH (6.9)

Both the definition of the neighbourhood histogram (6.6) as well as the definition of the

histogram of a region (6.9) represents the estimated probability function of the values that the

variable X can have in the area of the associated image. This histogram vector (6.6,6.9) directly

shows the variability of variable X in the associated region or area of the image, so therefore, it

can be directly used as a feature vector that includes the spatial variability of X.

y


Page 132

In this manner, the study of the shape of the HX vector, as shown in figure 6.12, uses spatial

information included in it in order to achieve a more precise discrimination of the elements to be

classified. Figure 6.12 additionally shows that the four types of distributions can be clearly

differentiated through the use of spatial features. However, it is not possible to classify them by

only using the value of the central point, but rather require a spatial study in order to obtain an

optimal discrimination.

Fig. 6.12. Histograms associated to different types of distributions of the values of variable X. a)

Constantly light distribution, b) Constantly dark distribution, c) Chess distribution, d) Variable

distribution

However, the use of the definition of the histogram vector ),(ˆ yxX

H as a feature vector includes

the spatial distribution of the elements and uses spatial variability in the discrimination of those

elements.

2. Extension of the fuzzy spatial histograms to vectorial features.

Spectral-spatial histograms

The previous section provided definitions of spatial histograms that are capable of including

spatial information of the neighbourhood and regions contained in the images defined by a single

scalar variable. In this section, the earlier model of spatial histogram is extended to vectorial

images, that is, those defined by a feature vector as the case of spectral images.

),( yxX

H


Page 133

The method proposed in the following is valid for any type of multivariate image, in which every

point is defined by a vector irrespective of its origin. In this manner, one can directly apply it to

spectral images, in which the feature vector defines its texture, or any type of combination or

representation of data or features that could be represented in a vectorial manner, such as features

extracted from hyperspectral images that were mentioned in earlier chapters.

Analogous to the previous section, a two-dimensional image is that which is represented in every

point P(x,y) by a feature vector ),( yxI having each element of vector I the value of a feature Xj

associated to that point P(x,y).

Fig. 6.13. Representation of a vectorial image by the vectorial function I(x, y).

In this section, the two types of histograms which were proposed earlier (neighbourhood and

region) are going to be re-defined in order to apply them to vectorial images. In this manner, one

can achieve the integration of the spectral information contained in the image with the spatial

variability in the associated neighbourhood or region.

For this, we propose to extend the method of discretisation of the histogram to vectorial variables,

as well as the generation, from this vector, of a spectral histogram vector which simultaneously

contains information that is spectral as well as spatial.

),(

),(

...

...

),(

),(

2

1

yx

yxX

yxX

yxX

M

I=


Page 134

2.1. Quantization of the feature vector.

This section details the quantization of each of the components jX of the vector ),( yxI . This

quantization is done through the separate quantization of each of the scalar components jX of

the vector ),( yxI .

Analogous to the scalar case, in which a quantization vector was obtained XQ (equation 6.3) for

the studied feature X, in the vectorial case, a quantization vector is obtained ),( yxjXQ for each

of the features. Each vector ),( yxjXQ has N components which correspond to each of the groups

in which the histogram has been discretised for that variable jX , obtaining a vector

),( yxjXQ for each of the M components jX of the vector ),( yxI . The calculation of this

vector ),( yxjXQ for each of the components is shown in equation 6.10.

)](),...,(),([),( 21 jNjjX XHMfXHMfXHMfyxj

=Q (6.10)

Taking each of the quantization vectors ),( yxjXQ for each of the components, one can create an

aggregate vector of quantization ),( yxQ that contains the quantizations of all the components of

),( yxI in a sole vector:

)],(),...,,(),,([),(21

yxyxyxyxMXXX QQQQ = (6.11)

Figure 6.14 illustrates the generation of this aggregate vector of quantization from the

quantization of each of the components of ),( yxI . One can notice that the dimensions of this

vector are [M · N, 1], where M is the number of components of the vector ),( yxI , and N the

number of groups in which the histogram has been discretised.


Page 135

Fig. 6.14. Vector Quantization ),( yxI

The generation of the aggregate vector Q(x, y) for the quantization of vector ),( yxI is going to

define a simple and robust calculation of the spatial histograms associated to vectorial images, as

shall be seen in the following section.

2.2. Definition of fuzzy vectorial histograms.

Without taking into account the quantization vector described in an earlier section, let us suppose

that one proceeds to the quantization of each of the M features jX in a separate manner, thereby

separately obtaining the histogram associated to each of them for a specific neighbourhood or

region.

Each of these histograms would be defined by the following equation for neighbourhood

histogram cases:

),(),( yxYXjX

BY

BYy

AX

AXxjX

QH ∑∑+

−=

+

−== (6.12)

And the following when dealing with region histograms:

),(

),(

...

...

),(

),(

2

1

yx

yx

yx

yx

MX

X

X

Q

Q

Q

Q

=

),( yxI


Page 136

∑ℜ∈∆

=ℜ),(

),()(yx

jXijXyxQH (6.13)

The normalised histogram for each of the different types of histograms would be defined as:

1

ˆ

jX

jX

jX H

H

H = (6.14)

In this manner, each histogram represents the spatial variability of each of the components of the

vector ),( yxI for that region or neighbourhood.

In a vectorial case, there are M histograms associated to each of the components of the vector

),( yxI . Each of the vectors jX

H represents the spatial variability of the variable which they

represent. These vectors can be grouped in a sole aggregate histogram vector ),(ˆ yxH which

represents the spatial variability of the components of the vector ),( yxI in a joint manner:

)],(ˆ),...,,(ˆ),,(ˆ[ 21),(ˆ yxyxyx Myx HHHH = (6.15)

Figure 6.15 graphically shows the construction of this aggregate histogram vector ),(ˆ yxH . Let us

mention that it includes within it the information on the spatial variability of each of the variables

jX in a simultaneous manner and that it characterises in a sole vector the spectral and spatial

properties of the image in a specific neighbourhood or region.


Page 137

Fig. 6.15. Graphical representation of the aggregate histogram vector ),(ˆ yxH

Going back to the definition of the aggregate quantization vector ),( yxQ , it is possible to obtain

the aggregate histogram vector ),( yxH by directly using the vector ),( yxQ . This way,

substituting ),( yxjx

Q for its aggregate equivalent, one achieves in a simple and elegant manner

the definition and calculation of the aggregate histogram vector ),( yxH which speeds up its

calculation while reducing its computational cost.

Through the use of ),( yxQ , the calculation of the fuzzy histogram vector would be:

),(),( jiyx

By

Byi

Ax

Axi

QH ∑∑+

−=

+

−== (6.16)

Where A and B establish the neighbourhood limits.

In the case of an aggregate histogram for a region, one obtains:

…

X1

Quantization

…

X2

XM

Quantization

Quantization

),(ˆ yxH

),(ˆ yxMH

),(ˆ2 yxH

),(ˆ1 yxH


Page 138

∑ℜ∈∆

=ℜ),(

),()(ji

ijiQH (6.17)

These aggregate histogram vectors defined in equations (6.16) and (6.17) compile the spatial

variability of the vectorial features (spectral) of the image as shown in figure 6.15. This way the

discriminating capacities of spectral and spatial information that characterise the associated

neighbourhood or region can be elegantly combined in a sole vector.

3. Conclusions

This chapter has introduced the concept of fuzzy neighbourhood histograms and fuzzy region

histograms, which have the property of being able to represent the spatial behaviour of a certain

variable.

This definition has been extended to any type of vectorial image. In this manner, one can extract

an aggregate histogram vector that joins, in a sole feature vector, the spatial and spectral

properties of a certain region of a vectorial image.

The fuzzy quantization of spatial histograms allows for its better representation, avoiding sharp

changes in the shape of the histogram due to noise, the inherent variability of the acquisition and

to values near the border points.

The combination of spectral and spatial features in a sole vector allows for a better separability of

overlapping clases. One can achieve greater information on the statistical distribution of the

discriminating variables through the study of spatial and spectral properties of the neighbourhood

or region.

Using this approach, each point of the vectorial image is defined not only by its spectral features,

but also by the spectral-spatial histogram defined in the neighbourhood of the point, thus

increasing the information describing that point.


Page 139

The inclusion of spatial features not only increases the separability in classes that have

uncertainty areas, but also models objects composed of different materials, as the presence of

diverse materials shall be detailed in the histogram without affecting it.

Another characteristic of these spectral histograms is that they are composed of the sum of

random variables represented by each of the components that indicate either membership or not to

a specific group present in the vector ),( yxQ . Applying the central limit theorem [BISH_06] for

large regions, the sum of these random components will tend towards a Gaussian behaviour. For

this reason, the use of large neighbourhoods in the calculation of the histogram is going to allow a

tendency of each of the variables of the histogram towards a Gaussian behaviour. This can

simplify the use of classifiers based on Gaussian or Gaussian mixtures for classification purposes.


Page 140

Page 141

Chapter VII

Classification of spectral images and region analysis

Chapter VII Classification of spectral images and region analysis

Page 142

Earlier chapters have shown different techniques that allow the feature extraction in order to

define and model the properties, both spectral as well as spatial, of each of the elements of a

hyperspectral image within a sole feature vector. However, these methods have been described in

an independent manner, without going into much detail over the integration within the complete

process for the classification of hyperspectral images.

This chapter describes the proposed methodology for the classification of hyperspectral images

from a theoretical perspective encompassing the entire process, from the initial acquisition of the

image to the proposed methods of re-classification.

In order to do so, this methodology is described in a global manner, which clarifies the proposed

process of classification. Subsequently, further detail shall be provided on the different modules

which shape the process. The different possible alternatives for its implementation are introduced

in the description of each of the modules.

1. General description of the process of classification

This section briefly lists the proposed methodology for the extraction of feature vectors and their

use for the classification of materials in hyperspectral images.

The proposed method is mainly divided into two stages. In the first, an extraction of the content

of the background of the image is made. This segments those elements which are of interest, in

order to subject them next to a normalisation process and decorrelation of the spectrum that

extract a feature vector capable of representing the spectral-spatial features of each of the image

points and that classifies it based on a statistical classifier.

The different modules contained in this first phase are the following:

1. Image capture.

2. Background extraction and segmentation.

3. Normalisation of the lighting conditions.

4. Spectral decorrelation.

5. Integration of the spectral-spatial features.

6. Preliminary classification of the image.


Page 143

Figure 7.1 is a graphical description of the procedure employed. First, a hyperspectral image is

obtained on which to proceed with the segmentation of the background. Subsequently, one

proceeds to the normalisation of the illumination of each of the spectral vectors that are

associated with each pixel in order to avoid its variations due to lighting. Once normalised, the

discriminant spectral features of the spectrum are extracted through a process of decorrelation

(Chapter IV and V). This adds spatial information to them through the use of neighbourhood

histograms (Chapter VI). Finally, the obtained feature vector is assigned to its corresponding

class through the use of statistical classifiers (Chapter III ).

Fig. 7.1 Detailed process of the first classification phase.

Background extraction

Filtering of the spectrum and lighting correction

Decorrelation

Fuzzy sets

Normalisation of the spectrum

PCA and other methods

Extraction of the spectral feature vector

transfomedL

Acquisition of the spectral image

Classification

Integration of spectral-spatial neighbourhood features

Neighbourhood histogram

Quantization vector

Extraction of the feature vector

),(ˆ yxH


Page 144

As seen in figure 7.1, the final result of this phase is an image that labels each of its pixels within

any of the modelled classes. However, due to the great dispersion of the materials to be classified,

some of these regions shall not be correctly classified, so therefore we propose to have a second

phase that corrects those erroneously classified regions.

Bearing in mind that the erroneously classified regions usually are connected with other correctly

classified regions, in this second phase we proceed to the correction of the earlier classification

results through the use of region histograms (Chapter VI) that combine spectral and spatial

features of each of the classification regions that were obtained in the previous phase. This way,

based on these histograms, one can check the contiguous regions in order to estimate their

possible merging and reclassification, as stated as follows:

7. Merging of regions.

8. Reclassification of regions.

Figure 7.2 graphically shows the proposed procedure for this second phase:


Page 145

Fig. 7.2 Region histogram and reclassification of regions.

Through the use of this second phase, we achieve the re-classification of those elements that,

although having an acceptable similarity with the adjacent regions, were incorrectly classified in

the previous phase.

2. Classification of hyperspectral images

This section provides details on the first part of the previously described process which includes

the image acquisition, the correction of lighting, the spectral decorrelation and the inclusion of

spatial features in the feature vector. Subsequently, a classification of each image pixel is done

based on the feature vector that is extracted through the use of a statistical classifier.

Extraction of region features

Region histogram

Verification of the merging of regions

Reclassification

H

HHmerged


Page 146

The following sections provide step by step details of each of the modules of this phase, as shown

in figure. 7.1.

2.1. Acquisition of an image and the correction of lighting.

As mentioned in Chapter II, the acquired spectral image is represented by a three-dimensional

matrix. The first two dimensions of the matrix represent the (x,y) position of each of the points in

the image. The third dimension represents each of the wavelengths reflected by each pixel, as

shown in figure 7.3.

Fig. 7.3 Representation of the hyperspectral image

In this manner, each point (x,y) of the image L is represented by a vector L(x,y), whose

components correspond with each of the K responses in intensity of the wavelengths in which the

spectrum is discretised, that is, the quantity of light reflected in that (x,y) pixel based on its

wavelength. This way, each point in the image is represented by the vector L(x, y), that is

associated with the spectral response to that point, as defined in equation (7.1).

TKLLL ,...,, 21=L (7.1)

However, the appearance of this spectral vector L depends on several factors: the spectrum of the

incident light, the composition of the material, the external geometry of the material, the

reflections and interactions of the lighting with the numerous elements and other factors. Chapter

II described the process of the generation of the reflected spectrum which, in the event of diffuse

lighting conditions, would remain in the following manner for every type of material:

( ) )(·)·()( λλλ lightLCmL Ω= (7.2)

x

λ

y


Page 147

Where )(Ωm is a geometric coefficient that defines the percentage of the spectrum of incident

light that is observed by the sensor and that depends on the relative position between the

illumination, the camera and the 3 D geometry of the object, as well as on the interactions due to

its possible rugosity, as defined by Ω .

( )λC indicates the reflectivity or the chromaticity of the material and represents the percentage

of reflected light by the material based on its wavelength. )(λlightL represents the vector of

incident lighting.

Therefore, one can observe that, of the three components that define vector L, only the value of

the reflectivity or chromacity of the material ( )λC is useful for its characterisation, given that the

other factors do not depend on the composition of the object, but rather on other factors. This

way, the incident spectrum of lighting depends on the illumination used and the geometric

coefficient )(Ωm depends on the geometry between the different elements, without taking into

account the composition of the material.

Therefore, it is necessary to transform vector L in order to make it independent, both from the

geometric variables that are modelled by the vector )(Ωm , as well as from the source of lighting

used and defined by the incident spectrum of lighting )(λlightL .

2.2. Independence from the lighting source

Figure 7.4 shows the different types of lighting spectra )(λlightL that are appropriate for their use

in hyperspectral applications. Figure 7.4a shows continuous lighting, which is that most

commonly used. This lighting has an adequate emittance for all wavelengths that are going to be

used to characterise the different materials. The second type of lighting (Fig. 7.4b), corresponds

to a white lighting, which has the particularity of having a similar emissivity in all the spectrum

range, providing a sensation of white hue to the human eye. The third type of lighting (Fig. 7.4c)

shows a light with the same emissivity value for each of the spectrum regions. This lighting is

only used in applications of great precision, due to the difficulty in obtaining a source of lighting

that has these features of equal emissivity in a wide range of wavelengths.


Page 148

Fig. 7.4 Representation of different types of lighting emission spectra. a) Continuous spectral

lighting, b) White spectral lighting, c) Ideal spectral lighting

Changes made in the incident spectra of lighting cause, as can be seen in (7.2), a univocal change

in the observed spectrum L, making its characterisation impossible or difficult when facing

different or variable incident lighting spectra.

In order to achieve the invariance of vector L when facing an incident spectrum of lighting

)(λlightL , one begins with the methodology proposed by Tan et al [TAN_04]. The aim of this

method is to obtain the reflectance spectrum that would have been obtained if an ideal incident

white lighting would have been used with an emittance of an identical value in each of its

wavelengths (Fig. 7.4c), thus making vector L independent of the incident spectrum of light.

In order to achieve this correction, one first calculates the chromacity of the incident light (7.3):

∑=

=N

n

light

light

light

nL

LC

1

)(

)()(

λλ

(7.3)

Where N is the number of components of the spectrum.

λ

Emittance LLight

λ

LB

λ

LB ideal


Page 149

This chromaticity value represents the percentage of intensity of each of the wavelengths in the

spectrum of light. This way, it expresses the percentage of intensity emitted by each wavelength

in relation with the total emitted intensity and not with the absolute intensity of the lighting

spectrum.

Dividing vector L by the chromaticity of the obtained light, the result is a reflectance spectrum

that is independent of the spectral appearance of the source of lighting used, thus being equivalent

to the spectrum that is obtained under ideal white lighting conditions.

( ) ( )∑=

Ω=Ω

==N

n

light

light

light

light

nLCmC

LCm

C

LL

1

)(·)·()(

)(·)·(

)(

)()(ˆ λ

λ

λλ

λλ

λ (7.4)

∑=

=N

n

lightlight nL1

)(L (7.5)

The module of the lighting spectrum (7.5) does not depend on the wavelength and represents the

luminous intensity of the light source. Due to the fact that this intensity does not depend on the

luminous characteristics of the object, one can integrate it in the geometric coefficient )(Ωm

(where Ω is all those variables that influence that coefficient) which indicates the quantity of

light that comes into contact with that specific point of the material.

By integrating this module into the coefficient, the following definition for the reflectivity vector

is obtained:

( )λλ CmL )·()(ˆ 2 Ω= (7.6)

In this manner, the vector L so obtained does not depend on the appearance of the spectrum of

incident lighting, but only on a scalar geometric parameter, )(2 Ωm ,which defines the intensity

with which the incident lighting in that point of the image is observed by the sensor, and on the

chromaticity of the material itself ( )λC .


Page 150

Given that it is usually not possible to directly obtain the chromaticity of lighting lightC , this can

be estimated through the reflectance spectrum obtained from a white body of reference, as shown

in figure 7.5.

Fig. 7.5 Representation of the hyperspectral vector L and the white body of reference

( WhiteBodyL ).

In this manner, the chromaticity of incident lighting can be estimated through the use of the

spectral vector that is emitted by a white body (7.7), as shown in equation (7.8).

TKWhiteBodyWhiteBodyWhiteBody ,...,LLL

21,=WhiteBodyL (7.7)

( ) ( )λλ

λ lightN

n

WhiteBody

WhiteBody

WhiteBody C

nL

LC ≈=

∑=1

)(

)( (7.8)

The equation for the calculation of the spectrum of reflection invariable to a source of lighting

(7.9) is obtained by replacing the chromacity of the light in (7.4) by its estimate based on the

chromacity of a white body of reference:

)(

)()(ˆ

λλ

λWhiteBodyL

LL = (7.9)

2.2.1. Independence from the geometric coefficient

The spectrum calculated in (7.9) is presented as invariant to the type of lighting employed. It is

only dependent on two factors, as shown in (7.6): the spectrum of the chromacity of the material

λ

Intensity

L

WhiteBodyL


Page 151

( )λC and a scalar coefficient )(2 Ωm that depends on the geometric parameters of the point that

indicates the intensity with which the sensor perceives that reflection.

In order to isolate the chromacity vector from the geometric variables and shines that define the

material, diffusivity conditions in lighting are assumed which allow for the application of the

model defined in (7.2). Given that this diffusivity in lighting is not perfect and that objects do not

have to have a Lambertian or dichromatic behaviour, three different methods are proposed which

reduce the influence of the geometric factor )(2 Ωm in the estimation of the chromacity of the

material.

2.2.1.1. Normalisation of the spectrum

The simplest method to reduce these phenomena is based on the normalisation of the spectrum

via the division by its L1 norm, thus reducing by a grade the dimensionality of the vector L .

Despite this loss of information, the normalised invariant spectrum that is obtained in (7.10) is

invariable to geometric changes, being solely dependent of the spectrum of chromaticity of the

material, as shown in (7.11).

∑=

=K

n

Norm

nL

LL

1

)(ˆ

)(ˆ)(ˆ λ

λ

(7.10)

( )

( )

( )

( )∑∑==

=Ω

Ω=

K

n

K

n

Norm

nC

C

nCm

CmL

112

2

)·(

)·()(ˆ λλ

λ

(7.11)

2.2.1.2. Montoliu's Invariant

However, the earlier method assumes conditions of the diffusivity of lighting that do not happen

in fact. For this, a proposal is made to use the invariant which was proposed by Montoliu et al

[MONT_05] for Shafer's dichromatic model [SHAF_84].


Page 152

Montoliu's method is based on the assumption, as explained in the section on lighting of Chapter

II, that the subtraction and later division between two bands of the spectrum causes invariance to

shines, intensity in the lighting and geometry of the object under Shafer's dichromatic model.

In this manner, we subtract from each of the components of the spectrum )(ˆ λL the minimum

value of this component, then normalising the resulting vector.

( )( )( )∑

=

−

−=

K

n

Montoliu

LLLL

LLLLL

1

)(ˆ),...,2(ˆ),1(ˆmin)(ˆ

)(ˆ),...,2(ˆ),1(ˆmin)(ˆ)(ˆ

λλ

λλλ

(7.12)

2.2.1.3. Stockman´s Invariant

Another technique that eliminates the influence of the geometric factor is that proposed by

Stockman [STOCK_99]. He defined the spectral hue as a scalar invariant in order to define the

hue of a spectrum that is similar to the hue obtained in the RGB image. However, this hue only

defines each spectrum with a single value, excessively decreasing the discriminating power of

that feature.

For the calculation of this hyperspectral hue, Stockman creates what he calls a desaturated

spectrum. This spectrum is calculated in two steps. First, it is made independent of the lighting

through the normalised proposed in (7.10).

In a second step, this spectrum is desaturated through the subtraction of the minimum value

contained in that spectrum, as shown in (7.13).

( ))(ˆ),...,2(ˆ),1(ˆmin)(ˆ)(ˆ KLLLLL NormNormNormNormStockman −= λλ (7.13)

Although Stockman does not use this spectrum as discriminating, but rather only uses it for

obtaining the spectral hue, we propose to use this method in order to achieve the invariance of the

spectrum when facing geometric changes.

All the previously proposed methods obtain a vector which has invariance to the geometry of the

object, always when those ideal conditions defined in equation (7.6) are met. However, given that


Page 153

these ideal conditions do not happen in fact, a proposal is made for the use of these three methods

in order to achieve the invariance of the extracted spectrum to the geometric and lighting

variables. Chapter VIII will analyse the effectiveness of each of these methods, which will select

that method which achieves this invariability in a more efficient manner.

2.3. Decorrelation of the luminous spectrum

As mentioned in earlier chapters, due to both the high dimensionality of the luminous spectrum

and the great redundancy of the existing data, it is necessary to reduce its dimensionality. The

objectives for this reduction in dimensionality are two-fold:

- Reduce the amount of redundant information present in the spectrum.

- Transform the data in order to produce a representation where the separation of the

material is maximum.

In order to achieve this reduction of features, we propose several methods for feature extraction

set out in Chapter IV, as well as the method based on the division of the spectrum in fuzzy sets

proposed in Chapter V.

First, we propose the use of the raw spectrum (RAW) and the wavelengths which correspond to

colours red, green and blue of the spectrum (RGB) in order to verify the rate of classification that

would be obtained through the use of a normal colour image or the complete spectrum. In this

manner, one manages to evaluate the efficiency of the proposed algorithms.

The use of the Principal Component Analysis (PCA) or of Fisher's Discriminant (LDA) extracts

highly discriminant features, thus allowing for an adequate compression of the data and

effectively reducing the Hughes phenomenon for the classification. On the other hand, the use of

these methods implies a previous training and furthermore, the obtained features are dependent on

this training set.

As explained in Chapter V, when classes have variations throughout time or when new classes are

added to the system, the discrimination of the obtained discriminant features will vary, thus

making their re-training necessary. This causes the features based on PCA or LDA to not be

optimal where the classes have variations over time or when new classes are added to the system.


Page 154

Chapter V showed the necessary conditions for solving these limitations. In this manner, the

extracted feature vector should comply with the following conditions:

− Reduction of dimensionality.

− Adequate discriminatory power.

− Independence from training and from the variability of classes.

− Discriminating power based on the underlying physical properties.

− Generic features that do not depend on the application or the class.

− Maintaining the physical meaning of the features.

Because of all this, we also propose the use of the method based on spectral fuzzy sets that is

described in Chapter V. This method is designed to optimise the discriminating power of the

different absorption bands of the spectrum and complies with the previously stated conditions.

The following sections define the different proposed decorrelation methods in order to reduce the

dimensionality of the spectrum and to increase its separability.

2.3.1. RGB.

In order to be able to compare the goodness of hyperspectral techniques versus the classical

processings based in colour, a spectral vector of three components is created, each associated to

the wavelength of the components of the colours red (650 nm), green (510 nm) and blue (475

nm).

Using this decorrelation, one can estimate the amount of additional discriminating information

that is offered by the proposed methods versus the same methods based on colour.


Page 155

Fig. 7.6 Reduction of features based on RGB

In this manner, the feature vector X, which defines the luminous spectrum, shall be provided by:

,,)(ˆ),(ˆ),(ˆ BGRLLL BLUEGREENREDRGB == λλλX (7.14)

Where )(ˆ λL is the normalised luminance vector, as described in equation (7.9).

2.3.2. RAW.

The second proposed decorrelation method does not make any type of conversion to the spectrum

other than the corrections in lighting and normalisation that were previously defined. In this

manner, its high dimensionality or redundancy is not reduced.

The use of this raw spectrum can be used as a comparative pattern when facing the effects on the

classification produced by the reduction of features caused by the following methods.

The X vector of features corresponds with the normalised luminance vector, as defined in the

following equation:

LX ˆ=RAW (7.14)

λ

λ

Intensity

λ=475nm λ=650nm λ=510nm


Page 156

Where L is the luminous spectrum under some of the previously defined normalisations

(equation 7.9).

2.3.3. Principal Component Analysis (PCA).

The first proposed method of reduction of features is based on the widely known transform of

Karhunen-Loève [HOTT_33]. The Principal Component Analysis, thoroughly detailed in

Chapter IV, obtains a projection of the data on a subspace of smaller dimension where the

variance of the projected data is maximised.

In order to calculate that subspace, one can select, as training, a set of spectra Lwhich cover the

set of classes to be modelled. Through the use of these training elements, one can calculate the

eigenvalues and eigenvectors of the associated covariance matrix which define the vectors that

establish the subspace of projection.

The first M<K eigenvectors of the calculated covariance matrix are selected, which are associated

to the larger M eigenvectors, which construct the transformation matrix V as detailed in Chapter

IV.

[ ] )ˆˆ·(ˆ,...ˆ,ˆ,ˆ

)ˆˆ(

T321 LLuuuu

LLVX

−

=−=

M

t

PCA (7.15)

The transformed vector X which represents the spectrum is calculated through equation (7.15),

where L is the mean vector of all the spectra L used for the calculation of the transformation

matrix. For additional information on the application of this methodology, see Chapter IV.

Once the transformation is done, the luminous spectrum is represented by vector X in the

subspace defined by the previously calculated eigenvectors. This representation reduces the

dimensionality of the spectrum, thus reducing the Hughes Phenomenon, while at the same time

decorrelating and compressing the information contained in it.


Page 157

2.3.4. Fisher's linear discriminant.

Another proposed method of decorrelation is that based on the linear discriminant analysis

proposed by Fisher. This technique, also thoroughly detailed in Chapter IV, does not transform

the system of coordinates by maximising the variance of the training points, but rather optimises

the separability between classes that make up the sample, thus selecting a non-orthogonal set of

axes that maximises this separability. The number of these axes depends on the number of classes

present in the classification and is equal to the number of existing classes minus one.

This method, in the case of Gaussian classes, obtains those axes whose separability corresponds

with the optimal separability that would have been obtained through the use of the Bayes optimal

classifier.

Vector X, which represents the spectrum, is described in detail in (7.16), w being the function of

the Fisher transformation, obtained as defined in Chapter IV.

)ˆ(LwXt

FISHER = (7.16)

2.3.5. Spectral fuzzy sets.

The last proposed method to decorrelate a spectrum takes advantage of the correlation between

adjacent bands in order to characterise the absorption bands that differentiate different materials.

Chapter V gave a thorough explanation of the fuzzyfication of the spectrum. The use of this

method involves the division of the spectrum into fuzzy sets that represent each of the ranges of

the spectrum and the measurement of the Energy of each of the fuzzy sets. The energy associated

to each of these fuzzy sets generates a feature vector that describes the electromagnetic spectrum

in a similar manner to that used by the human eye to characterise red, green and blue colours.

This way, the characterisation of the spectrum has a behaviour that is similar to that of a sensitive

eye in a wide range of the spectrum (hyperspectral eye).


Page 158

As detailed in Chapter V, the Energy associated with each of the fuzzy sets represents the

intensity of the spectrum for that set and can be defined as the discrete convolution of the

function of fuzzy membership located in the central point that defines the set.

∫=

=

=K

normii dLMf

λ

λ

λλλ0

)·()·(E (7.17)

The characteristic vector of each spectrum is determined by the vector composed by the Energies

associated with each of the fuzzy sets (7.18).

TMFUZZYSETS EEE ,...,, 21=X (7.18)

This description of the spectrum efficiently characterises the spectral absorptions in an efficient

manner without requiring a previous training. Additionally, the dimensionality of the spectrum is

efficiently reduced while at the same time keeping its discriminating features.

2.4. Integration of spectral-spatial features.

The feature vector X that defines the spectrum efficiently characterises each of the hyperspectral

pixels. However, this approach does not include spatial information of the neighbouring elements.

The use of spatial information provides additional information in such a manner that its inclusion

in the model will allow a better separability between the different classes.

As explained in Chapter VI, a proposal is made for the use of fuzzy neighbourhood histograms in

order to efficiently integrate those spectral and spatial features of the spectrum, taking into

account the variability of the spectral features in a given neighbourhood or region (spatial

variation).

In order to construct this spatial histogram, it is necessary to define a neighbourhood around each

pixel. Once defined, one can calculate M independent histograms for each of the M components

of vector X. In order to create the spectral-spatial vector, these M histograms are concatenated as

follows:


Page 159

[ ]),(),...,(),,(),( 21 yxyxyxyx MHHHH = (7.19)

The histogram shown in equation 7.19 codifies the spatial distribution of all hyperspectral vectors

in the selected neighbourhood. Figure 7.7 shows the calculation of the histogram vector from the

vectorial representation of the image in a pre-established neighbourhood. For more detail on the

calculation of histograms, see Chapter VI.

Figure 7.7 Construction of the spectral-spatial vector H(x,y)

2.5. Classification Procedure.

Earlier sections have described the procedure that is used to obtain the feature vectors that merge

the spectral and spatial characteristics. The aggregate histogram defined in (7.19) defines the

spatial distribution of the spectral information in a pre-established neighbourhood surrounding

pixel (X,Y) in a sole vector.

This vector H is used as the input vector for classification in case where the aim is to integrate the

spectral-spatial information. In cases where the spectral information is enough, any of the


Page 160

implementations of the vector X that are defined in section 4.2 as an input vector shall be

included.

In order to evaluate and compare the goodness of the obtained feature vector, the use of a

classifier based on the multivariate Gaussian distributions is proposed. The use of this classifier,

instead of using the more complex classifiers described in Chapter III, lies in the adequate

interpretability as well as in the good generalisation obtained with this type of classifiers. Its

simplicity evaluates the goodness of different features used for the characterisation of different

materials.

In this manner, Ci are each of the classes to be classified, which are defined by a set of Ni training

vectors H(x,y) (or X(x, y) if one does not include spatial features) belonging to this class. From

this set of vectors, a Gaussian model is created for each of the classes from the associated training

vectors, as defined in the equations (7.20) for the calculation of the mean vector associated to

each of the classes:

)(1∑=

=i

i

N

n

nC Hµ (7.20)

And the calculation of the covariance matrix of maximum likelihood for those training vectors

that belong to each class (7.21)

( )TCn

N

n

CnC i

i

ii NµHµHΣ −−

−= ∑

=

)(1

1

1

(7.21)

From the average mean and covariance matrix vectors of each of the Ci classes, one can obtain a

Gaussian model for each of them (7.22). This model obtains the probability of membership of

each feature vector to each of the classes. Equation (7.22) shows the probability of membership

of the feature vector H to each of the previously modelled classes:

−∑−− −

∑=∑Ν

)()(2

1

2/12/

1

)·2(

1), (

iCT

iC

i

iie

C

DCC

µHµH 1

µHπ

(7.22)


Page 161

Assuming a Gaussian distribution of classes and the Bayes' theorem, one obtains that the decision

of the most plausible class shall be given by that which has the smallest cost. Analysing equation

(7.22), the most plausible class is determined by that whose Mahalanobis distance to the class

model is minimal, as shown in (7.23).

ijifiClass ii

T

ijj

T

j ≠∀−∑−>−∑−= −− ,)()()()( µxµxµxµx11

(7.23)

In this manner, each feature vector H(x,y) associated to each image pixel (x,y) is labelled as that

class whose Bayes' cost is least, thus obtaining a relation between each point of the image and its

most probable membership class.

This way, one obtains an image labelled B(x,y) which contains values ranging from zero to the

number of existing classes, the zero value assigned by rule to the class which represents the

background of the image, as shown in figure 7.8.

Figure 7.8 Labelling of the regions after the classification process

3. Analysis and merging of regions.

The previous section provided details on the obtaining of the label image B(x,y). This image

divides the image into several regions, each associated to a specific class. However, due to the

important overlapping that exists between different models, the presence of shine and the

dispersion between the elements that belong to it cause some of these regions to be erroneously

classified.

However, in the great majority of cases, these erroneously classified regions have a reduced size

and, generally, are found connected to bigger regions that are correctly classified, as shown in


Page 162

figure 7.9. Taking this observation as the basis, a procedure for merging connected regions is

proposed based on their statistical features.

Using this approach, the totality of connected regions are statistically analysed and compared

with the different class models in order to decide whether a specific region should be re-classified

or unified with one or several of the regions connected to it.

Figure 7.9 a) Initial classification of regions, b) Real classification of regions.

In order to achieve this, two differentiated grouping and reclassification procedures are proposed:

one based on the calculation of the Region of maximum likelihood, and one which uses the

Normalised region histogram to describe each of them.

Both methodologies reclassify each of the regions with a more precise estimate based on the

global estimate of membership of that region to a certain class, thereby increasing the statistical

relevance of the classification.

Subsequently, the regions that are connected between them are analysed in order to statistically

estimate whether they should be grouped together or not and what should be the final

classification of the aggregate supra-region, as shown in figure 7.10.

Figure 7.10 Estimate of the probability of membership to each of the regions and the membership

probability of the aggregate supra-region.

)( BAapertenenciP ℜ∪ℜ

)( AapertenenciP ℜ

)( BapertenenciP ℜ

Membership

Membership

Membership


Page 163

In the following, the two proposed methodologies are presented in order to for the reclassification

and merging of the regions.

3.1. Region of maximum likelihood.

The first approach uses the probabilities of membership obtained through neighbourhood

histograms ),( yxH , described in equation (7.19), with each of the defined Gaussian models, as

detailed in equation (7.22), to calculate a sole probability of membership for each of the regions.

This probability is calculated as the sum of the membership probabilities of the neighbourhood

histograms associated with each of the region points to each of the existing classes:

∑ℜ∈∀

∑Ν=ℜj

ii

yx

CCjimembership yxCP),(

), ),((),( µH (7.24)

In the same manner, one calculates the probability that defines the statistical cost of unifying both

regions:

∑ℜ∪ℜ∈∀

∑Ν=ℜ∪ℜBA

ii

yx

CCBAimembership yxCP),(

), ),((),( µH (7.25)

In the great majority of cases, we have a reduction of the probability of unification regarding the

probability of maintaining both regions separate. The reason for this is that independent regions

have greater similarity with the class with which they were originally classified, thereby causing a

decrease in the probability of membership to the aggregate region.

The decision on the unification of both regions is done via the application of an empirical factor

of correction µ which models the degree of decrease in the probability, thus unifying the region

in a situation where the probability of unified regions is greater than the statistical probability that

the regions remain separate in order to obtain an optimal cost:

ℜ+ℜ≥ℜ∪ℜ )(·)(··)( BmembershipB

AmembershipA

BAmembership PN

NP

N

NP µ (7.26)


Page 164

Where NA and NB are the number of elements that belong to regions A and B respectively, and N

the number of elements belonging to the aggregate region RAUB.

3.2. Normalised region histogram

The previous method makes an estimate of the reclassification and merging of the regions using a

weighting of the probabilities of membership of each of the points of the region. This approach

does not allow for the extraction of a single feature vector that can describe by itself its associated

region.

In order to overcome these limitations, a proposal is made for the extraction of a characteristic

descriptor of the region that defines specific region models that take into account its different

degrees of variability (shines, oxidations...), thus capturing and modelling variations that can not

be completely represented through the use of neighbourhood histograms defined in (7.19).

One of the fundamental advantages of this approach lies in that a previous calculation of the

neighbourhood histogram is not necessary for every point, but rather each region is classified

according to a sole histogram (vector) that represents that region. The fact that there is no need to

previously calculate each of the neighbourhood histograms for each point , under certain

conditions, greatly increases the computational speed.

In order to obtain this feature vector, a proposal is made to extend the concept of fuzzy

neighbourhood histogram to every type of region (as explained in Chapter VI). In this manner,

the neighbourhood histogram is calculated for the complete region instead of for the

neighbourhood of a point. This histogram characterises the whole region through a sole feature

vector of spatial and spectral properties.

In order to make the region histogram vector (6.17) independent from the number of elements

contained in each of the regions, this is normalised through a division by the number of N

elements of that region:

Ni

i

)()(ˆ

ℜ=ℜH

H (7.27)


Page 165

In this manner, a normalised region histogram (7.27) is calculated for each of the previously

classified regions, except for those regions classified as background. The fact that these

histograms may have been previously normalised allow for a direct comparison with the

associated Gaussian models related to each of the classes.

Fig. 7.11. Extraction of region histograms in each of the previously extracted regions.

Each pair of connected regions is analysed for a possible merging. In order to carry out this

operation, the region histogram is calculated for each of the regions (Ha,Hb) and another region

histogram is calculated for the aggregate region obtained from the two connected regions (Hab)

(figure 7.12).

Fig. 7.12 Region histograms of the two candidate regions and of the aggregate region.

Each of the histograms is verified against the models that define the existing classes and each

region is assigned to a class which has a greater probability of membership.

In the event that all three regions are assigned to a sole class, the candidate regions are merged

and the class assigned to that region is kept. Otherwise, each of the regions are statistically

analysed in order to decide whether two regions should merge or not. The probability of merging

are calculated through the membership probabilities associated to each of the region histograms

for each of the different classes, as defined in equation (7.28).

Ha

Hb Hab


Page 166

), ()( iiiii CHP ∑Ν=∈ µH (7.28)

The likelihood of the aggregate region (7.29) is contrasted with the likelihood of both regions

remaining separate (7.30)

), ()(abab CCabab CHP ∑Ν=∈ µx (7.29)

merged

bbb

merged

aaabbaa

N

NCHP

N

NCHPCHCHP )·()·(),( ∈+∈=∈∈ (7.30)

In order to quantify the contribution of each region to the equation (7.30), each term of the

equation is weighted by the number of elements belonging to the associated region (Na and Nb)

and the number of elements contained in the merged region ( Nab ).

In this way, that hypothesis that has a greater probability is chosen and the regions are either

merged or not in accordance with the hypothesis that bears a minimum cost.

4. Conclusions.

This chapter has presented a complete procedure that classifies the elements contained in

hyperspectral images. This process has been described in each of its phases and in-depth detail

has been provided for the possible approaches for each of the modules that make up this process.

First, the problem dealing with the dependence on the chromacity of the material from external

factors of lighting and geometry has been tackled. Several options have been proposed that

reduce this dependence under non-ideal conditions.

Second, those techniques presented in Chapters IV and V for spectral decorrelation have been

integrated within the proposed framework of classification. In this manner, the issue of the use of

several methods of decorrelation within the entire procedure of classification has been raised.


Page 167

On the other hand, the methodology based on fuzzy neighbourhood histograms that is defined in

Chapter VI is placed within the general framework of classification. In this manner, one can

achieve the integration of spectral and spatial features within a same feature vector.

Furthermore, a proposal is made for the use of the classification of materials through Gaussian

models. The properties of these materials will not only allow for their correct classification, but

also for the efficient evaluation of different alternatives that are proposed.

Additionally, and in order to increase the classification rate of the system, a proposal is made for

a later reclassification based on the integration of connected regions that have similar properties,

based on the optimisation of a unification cost function.

This approach describes each of the regions through a sole vector that integrates its spectral and

spatial features without needing to separately obtain the spectral and spatial features of each of

the points.


Page 168

Page 169

Chapter VIII

Results

Chapter VIII Results

Page 170

Previous chapters theoretically describe diverse methodologies for the classification of different

materials or elements in hyperspectral images in order to obtain a robust and efficient algorithm

for the classification of these elements.

This chapter aims to describe the different tests undertaken that validate these methodologies in

the field of the hyperspectral image classification, and specifically, for the classification of

materials from waste electrical and electronic equipment (WEEE).

In order to do so, first, a description is provided on the nature and conditions of the acquisition of

the sets of images used. Subsequently, details are provided on those tests that aim to establish the

efficiency of each of the different proposed methodologies and an analysis is provided on the

results that are obtained in order to validate or not the hypothesis made.

1. Description of the data sample

The validation of the different proposed classification algorithms is done in the context of the

classification of non-ferrous materials for recycling. Specifically, an evaluation is made of the

following materials from electrical-electronic waste: white copper, aluminium, stainless steel,

brass, copper and lead (Figure 8.1). An additional class is added to these materials, which is the

image background, made up by the conveyor belt on which the capture is made. This background

element (conveyor belt) has been previously chosen based on some specific luminous properties

that reduce its reflected intensity.

These material samples are made up of non-ferrous materials that are obtained after the process of

chopping, magnetic sorting and densiometric sorting. The set of resulting materials is composed

by a mixture of non-ferrous materials (aluminium, copper, zinc, brass and lead) and by austenitic

stainless steel, which represent 13% of all scrap from waste electrical and electronic equipment.

The great existing similarity in the chromaticity, shape and weight properties of these materials

makes their separation impossible through current methods except for manual sorting. The

samples used have been provided by and previously classified by expert operators belonging to

the enterprises Indumetal Recycling S.A. and IGE Hennemann Recycling GMBH, enterprises

which participate in the European project entitled SORMEN [SORM_06]. For this selection, the


Page 171

great variability in the appearance of the different materials to be classified has been taken into

account, as well as those real problems that exist in their classification.

Figure 8.1 Materials analysed in this study.

The hyperspectral images evaluated in this study have been acquired using a hyperspectral line

camera PHL Fast10 CL made by Specim Ltd. [SPEC_08]. Although this camera can capture 1024

wavelengths, due to the high correlation of adjacent wavelengths, only 80 wavelengths, equally

spaced, have been selected covering a spectral range between 384.05 and 1008.10 nm.

For the acquisition of these materials, a machine vision system (Figure 8.2) has been used. This

system integrates the lighting and data acquisition and synchronisation devices which are optimal

for allowing the correct lighting of those wavelengths that are going to be acquired by the spectral

camera at a reasonable speed.

Fig. 8.2 Image acquisition system


Page 172

In this manner, one can acquire a set of hyperspectral images that associate each of its pixels

with a spectral vector made up of eighty components, which represents the luminous spectrum of

each material. Figure 8.3 shows the pseudo colour visualisation of a set of materials acquired by

this system.

Fig. 8.3 Acquisition samples of diverse materials

These materials bear similarities not only in their luminous appearance (colour) but also present a

high scattering in the luminous spectra associated to each of the classes (figure 8.4). This fact

causes that these materials can not be classified according to their luminous spectrum by using

the classical techniques used in spectral classification.

Fig. 8.4 Spectral scattering between the different materials (blue: aluminium, copper: red, brass:

green, lead: cyan, steel: magenta, white copper: yellow).

Wavelength

Intensity


Page 173

For performing the following tests, the set of images has been divided into two groups. The first

group has been used as training and the second has been chosen as a test set. The number of

available elements, both in the set of training elements as well as the test set, is near the half-

million elements for each set.

8,000 representative pixels were chosen from the training set, which are going to be used to

generate each of the models of the materials to be classified. In the real tests, a first classification

was made using these 8,000 representative pixels, then a second classification was made using the

rest of the pixels contained in the training set and a third phase used the elements contained in the

test set.

In order to keep a simple and clear presentation, the results analysed in this chapter correspond

with those obtained on the use of the test set, as this set is the most restrictive and that which

offers worst classification rates.

Figure 8.5 shows the set of images used to illustrate the undertaken tests and provides details on

the type of material and the correct classification associated to each of the pixels.

Fig. 8.5 Description of the test images. Left: Correct classification, Right: Original image.

Stainless steel Lead Aluminium Brass White copper Copper


Page 174

In the following sections, an evaluation is made on the different alternatives in processing that

were previously defined in Chapter VII, such as lighting correction, spectrum decorrelation for

background segmentation and for material classification, efficiency in the inclusion of spatial

characteristics in the spectral model, as well as the efficiency in region merging techniques,

among others.

2. Background identification

Figure 8.6 shows the great luminous scattering between different materials and the background

element (in black colour). This differentiating value is going to separate the image background

from the rest of materials in a simple and computationally efficient manner through a process of

background subtraction.

Fig. 8.6 Existing scattering caused by the different spectra (steel, brass and aluminium). The

background in black colour.

The objectives of this preliminary background subtraction of the image are two-fold. On the one

hand, the value of the average intensity of the spectrum is a factor that adds a very important

discriminating value in order to differentiate between the background element and the diverse

materials, given the conditions of a black body without shine that characterises the former.

However, this is a damaging feature for its use in the discrimination between the remaining

materials. Due to the specular physical properties of these materials, the value of the average

intensity is affected by the lighting conditions and the geometry of the object itself (shines,

Wavelength

Intensity


Page 175

shadows...) [SHAF_84]. This makes the approach based on the preliminary subtraction of the

background based on average intensity features to be suitable, as it both eliminates this variable

and optimises the process of classification of diverse materials through the normalisation of the

lighting without affecting the background segmentation.

On the other hand, the high degree of discrimination between the background and the material

obtains an algorithm of classification that is much more efficient and of low computational cost

by using a reduced number of components and provides greater real velocity of the whole system.

In order to proceed to the background segmentation, one must first analyse the discriminating

value of the spectrum's intensity. For this, a raw vector of the spectrum XRAW is used as the

feature vector, as defined in equation 7.14 of Chapter VII.

We apply to this vector the lighting corrections proposed by Montoliu (7.12) and Stockman

(7.13), which are characterised for being able to eliminate the incidence of lighting. This is done

in order to evaluate the effects of this correction in the discrimination of the background. These

corrections annul the discriminating power of the spectrum's average intensity due to their

dependence on geometric factors that define lighting (see Chapter VII, section 3.1). This leads

one to believe that the use of this correction will be detrimental to the background segmentation

which is characterised by a dark intensity.

Table VIII.1 shows the results obtained in which one can truly observe that the average intensity

of the spectrum does indeed provide discriminant information regarding the background

segmentation.

However, the use of vector XRAW as a feature vector is not the most suited due to several reasons.

Amongst them, one can emphasize its high dimensionality, which causes a poor and inadequate

TABLE VIII.1 EFECTS OF LIGHTING CORRECTION IN THE BACKGROUND SEGMENTATION

Without correction Stockman Montoliu

81.92% 59.15% 69.10%


Page 176

training due to the Hughes Phenomenon and, on the other hand, entails a high computational cost

due to the great number of components involved in this calculation.

In order to select the feature vector which best adapts to the desired requirements for the

separation of the background-material, an analysis is made on the precision of the different

variables of spectrum decorrelation defined in chapter VII.3.2.

Bearing in mind the great similarity of results and in order to enhance the processing speed, one

can notice that the method based on XRGB obtains a precision of over 97% when discriminating

between background elements and those elements that correspond with materials. Table VIII.3

shows in detail the goodness of the classification obtained by applying it to all the image pixels,

showing that in over 600,000 background elements, only 31,000 were classified as materials and

that in 400,000 pixels of materials, less than 20,000 were erroneously classified as background,

obtaining a global precision of over 95%.

TABLE VIII.3 CONFUSION MATRIX FOR A CLASSIFICATION BASED ON XRGB

Real Element

Classifed as:

Background Material

Background 575,126 19,149

Material 31,734 396,797

TABLE VIII.2 COMPARISON OF THE DIFFERENT ELEMENTS OF SPECTRAL DECORRELATION FOR THE DIFFERENTIATION

BETWEEN THE BACKGROUND AND THE MATERIAL

Decorrelation Method


XRAW XRGB XFISHER XPCA XFuzzysets

2 98.32% 97.19%

3 97.15%

4 97.13% 96.67%

6 97.36%

8 88.94% 86.63%

16 89.39% 88.95%

24 87.70% 90.14%

80 81.92%


Page 177

Figure 8.7 shows real images as well as the obtained background/material classification. These

results can be improved with morphological erode techniques [GONZ_08] in order to eliminate

those erroneously classified elements. The application of these techniques does not influence the

global classification level that is done afterwards between the different materials, in that the size

of the erroneously classified pixels is several magnitudes smaller than the size of the objects. The

application of these morphological operations eliminates small objects and obtains a

background/material segmentation close to 100%. Figure 8.7 shows the classification obtained

without the use of morphological techniques.

Fig. 8.7 Background classification based on XRGB spectral feature.

Above: original image, Below: classification mask.

TABLE VIII.4 CONFUSION MATRIX FOR A CLASSIFICATION BASED ON XRGB

Real Element

Classified as:

Background Material

Background 94.77% 4.60%

Material 5.23% 95.40%


Page 178

A conclusion to be emphasized in this section is the great separability that exists between the

background and the rest of the materials. This correctly separates the background and materials

with a reduced computational cost through the use of Gaussian modelling of the background and

of each of the materials only using the colour wavelengths (R,G,B).

The use of XPCA and of XFuzzy sets is discarded due to the fact that although the use of two sole

components is enough, it requires a previous calculation process that includes information of the

complete spectrum. In this manner, one can take advantage of the discriminating power of the

average intensity of the spectrum while speeding the total time for the computation. Those pixels

classified as materials shall be re-classified through techniques to be described in the following.

3. Influence of the correction of lighting for the classification of

materials

The earlier section has shown the discrimination capacity of the average intensity of the spectrum

for the differentiation between materials and background. However, this average intensity, as

detailed in Chapter VII, section 3.1, is influenced by elements which have nothing to do with the

composition of materials, such as the level of the incident light, specular nature of the object and

its geometric orientation.

This section shows the different results obtained in the classification of materials when varying

the method of lighting correction employed, which corrects the influence of specular regions and

the incidence of light in the classification. However, it maintains the discriminating power

contained in the spectral vector.

Despite still not having reviewed the different methods of spectrum decorrelation (section 6), or

the results obtained through the merger of the spectral characteristics, and in order to present in

an orderly manner the different factors that influence the classification, an estimate is going to

made on the efficiency of the different methodologies of lighting correction using the method of

decorrelation based on the Energy of fuzzy sets (XFuzzy sets). This decorrelation method is that

which offers best results as shall be shown in the following sections of this chapter. The

classification is done in two phases in order to correctly estimate whether there is or isn't a


Page 179

goodness of the different methods of lighting correction. In the first, only using the spectral

information and in the second using a combination of the spectral and spatial information. The

results obtained are presented in table VIII.5.

These results show an improvement in the image classification due to the correction of the

lighting done through the methods proposed by Stockman [STOC_99] and Montoliu

[MONT_05]. Figure 8.8 shows how the Stockman as well as the Montoliu methods reduce the

effect of existing shines and shadows.

Figure 8.8 Visualisation of the effects of the application of correction algorithms in an image's Xj.

a) Without correction, b) Stockman's correction, c) Montoliu's correction.

Although both methods correct the effects of lighting, Stockman's method most effectively keeps

the differences in existing intensity in the expression of the different components Xj which are

related with the composition of materials. This way, there is no reduction of the discriminating

power between different classes, which corroborates the better results obtained by the Stockman

method and which will be used as the standard method of correction of lighting in the following

sections.

4. Decorrelation of the luminous spectrum

TABLE VIII.5 EFFECTS OF LIGHTING CORRECTION IN THE CLASSIFICATION OF MATERIALS

Method of Correction

Method of decorrelation

Without correction Stockman Montoliu

FuzzySets-8 50.29% 71.52% 55.45%

FuzzySets-8 + Neighbourhood

70.74% 90.22% 84.46%


Page 180

This section aims to compare the goodness of the different methodologies employed to

decorrelate and extract the discriminating features from a spectral vector. In this context, an

evaluation is made on the obtained classification results through the use of RGB components,

whole spectrum (RAW), as well as methodologies based on the Principal Component Analysis

and those based on the Energy of a fuzzy spectrum, all of them described in section VII.3.2.

An evaluation is also made on the influence of the number of components in the precision of the

classification for those methods that involve the selection of a certain number of components,

such as the principal component analysis and the method based on the Energy of fuzzy triangles.

These X vectors are used as feature vectors to generate Gaussian models that define each of the

materials, as described in Chapter VII section 3.4. In order to obtain these results and after the

background subtraction, the method for the correction of lighting proposed by Stockman has been

used, as this is the one which has better results of classification, as seen in the earlier section.

The classification obtained through the use of the components based on RGB XRGB (42.26%) was

surpassed by the use of the complete spectral vector XRAW (55.67%). Its high dimensionality

together with its high correlation increases the classification through the use of decorrelation

methods.

TABLE VIII.6 COMPARISON OF THE DIFFERENT ELEMENTS OF SPECTRAL DECORRELATION FOR THE DIFFERENTIATION OF

DIVERSE MATERIALS

Decorrelation Method


XRAW XRGB XFISHER XPCA XFuzzysets

2 53.08% 52.76%

3 43.83%

4 61.40% 63.10%

5 62.86%

8 66.43% 71.52%

16 64.11% 71.43%

24 67.95% 71.67%

80 55.67%


Page 181

Although the methods base on previous training (XPCA, XFISHER) provide better results than the use

of earlier methods, the best results are those obtained through the method based on the Energy of

fuzzy sets XFuzzy sets, which obtain results greater than 70%.

Table VIII.7 shows the confusion matrix of different materials:

The experimental results described in Table VIII.6 and shown in figure 8.8 indicate that both the

methods based on PCA as well as the methods based on fuzzification of the spectrum provide

promising results when used for the decorrelation of hyperspectral data. In the experiments

performed, the use of fuzzy sets surpassed the classification rate obtained through PCA while at

the same time, avoided complications associated with the training process that is required by

PCA. For these reasons, we reach the conclusion that the technique based on these fuzzy sets

seems to be the most appropriate method of decorrelation.

TABLE VIII.7 CONFUSION MATRIX FOR THE CLASSIFICATION OF MATERIALS USING SPECTRAL INFORMATION EXTRACTED FROM

SPECTRAL FUZZY SETS OF 8 COMPONENTS

Real Material

Classified

Aluminium Copper Brass Lead Stainless Steel

White Copper

Aluminium 75.73% 0.79% 1.95% 1.65% 22.40% 0.85%

Copper 0.22% 94.44% 4.01% 0.71% 5.81% 2.84%

Brass 2.54% 2.38% 70.37% 1.42% 10.12% 13.08%

Lead 8.91% 0.00% 8.13% 86.52% 25.49% 3.98%

Stainless Steel

11.23% 0.00% 7.03% 8.75% 29.14% 6.35%

White Copper

1.37% 2.38% 8.50% 0.95% 7.05% 72.89%

Average 71.52%


Page 182

Figure 8.9 Classification of diverse materials through the use of fuzzy sets.

Both figure 8.9 as well as table VIII.7 show a high level of statistical overlapping in the different

models of the class, which makes its classification not be precise enough. In order to observe this

phenomenon, figure 8.10 shows the level of statistical overlapping of different classes.

Figure 8.10 Graph of overlapping of classes using 3 of the Fisher's projections. (blue: aluminium, copper: red, brass: green, lead: cyan, steel: magenta, white copper: yellow).


Page 183

In this manner, the high overlapping between different materials can be seen, which in turn,

explains the low classification rate that is obtained.

Figure 8.11 shows the feature vector that is created after decorrelating the spectrum. These graphs

show that the level of overlapping has been reduced in relation with the existing level before the

decorrelation (figure 8.4). Furthermore, one can see that this phenomenon is visually more

evident through the use of fuzzy sets than through the use of PCA.

Figure 8.11 Spectral scattering between the different materials (blue: aluminium, copper: red,

brass: green, lead: cyan, steel: magenta, white copper: yellow). a) PCA, b) Fuzzy sets

In order to correct the remainig overlapping, we propose to include those additional features that

increase the separability between materials in order to obtain an improvement in the achieved

rates of classification.

5. Integration of spectral and spatial features

The experiments described in the earlier section did not include spatial information for the

process of feature modelling of non-ferrous materials. The inclusion of this information in the

generation of feature vectors can help increase the statistical separation between different

materials and therefore achieve an improvement in the classification. For this, the neighbourhood

Number of fuzzy sets

Intensity

Number of PCA

Intensity


Page 184

histograms are calculated for each of the pixels in the image, as shown in Chapter VII, section

3.3.

A variation of the size of the neighbourhood used for the calculation of fuzzy histograms is made

in order to effectively evaluate the efficiency of the merger of spatial and spectral information.

This allows to model the degree of spatial information contained in the vector, thus being able to

estimate the influence of spatial information in the classification. The experimental data obtained

is shown in table VIII.8.

The results shown in table VIII.8 indicate that the development of a feature vector that

simultaneously includes spectral and spatial information generates more robust image descriptors

that elaborate more precise models of those materials to be classified. The low rate of

classification obtained through the use of RGB image features indicate that the use of spatial

information without taking into account spectral information does not efficiently increase the

separability of materials.

The results obtained from table VIII.8 also indicate that the decorrelation techniques based on

fuzzification of the spectrum provide more consistent results than those based on the Principal

Component Analysis. This experimental data lead to conclude that the inclusion of the spectral-

spatial features for the classification of materials are adequate, as shown by an increase of over

86% in the classification rate. The fact that these fuzzy sets are inspired on the functioning of the

human eye makes possible the representation, in an optimal manner, of the spatial properties and

texture contained in those images, as is done by the human being.

TABLE VIII.8 RATES OF CLASSIFICATION THROUGH THE INTEGRATION OF SPECTRAL AND SPATIAL FEATURES

Window size 8-PCA 8-Fuzzysets RGB

3x3 67.19% 77.30% 53.03% 5x5 73.77% 81.48% 56.14% 7×7 76.42% 83.77% 56.46% 11×11 78.85% 85.62% 57.76% 15×15 79.34% 86.45% 59.21% 19×19 80.41% 86.63% 59.83%

23×23 81.32% 86.42% 59.90%


Page 185

Taking a closer look at the results obtained for each material (table VIII.9), one can see that the

level of erroneous classifications has been considerably reduced and that two materials (stainless

steel and white copper) are those which create the majority of erroneous classifications.

If the separation of the different classes of materials is analysed through the visualisation of

Fisher components extracted from a feature vector that defines the histogram (Figure 8.12), one

can observe that the separability of classes has considerably increased in comparison with that

obtained through the use of only spectral information.

Figure 8.12 Graph of class overlapping using Fisher's projections. a) Visualisation of the 1st, 2nd

and 3rd component. b) Visualisation of the 3rd, 4th and 5th component. (blue: aluminium, copper: red, brass: green, lead: cyan, steel: magenta, white copper:

yellow).

TABLE VIII.9 CONFUSION MATRIZ FOR THE CLASSIFICATION OF MATERIALS THROUGH SPECTRAL-SPATIAL INFORMATION

EXTRACTED FROM SPECTRAL FUZZY SETS OF 8 COMPONENTS AND A 19 X 19 NEIGHBOURHOOD

Real Material

Classified


White Copper

Aluminium 87.25% 0.06% 1.24% 0.14% 11.24% 0.29%

Copper 0.00% 98.32% 2.34% 0.00% 0.01% 2.87%

Brass 1.61% 0.00% 90.61% 5.51% 8.98% 15.69%

Lead 0.16% 0.00% 0.32% 90.79% 2.46% 0.00%

Stainless Steel

10.97% 1.61% 5.48% 3.56% 75.90% 4.25%

White Copper

0.00% 0.00% 0.00% 0.00% 1.41% 76.91%

Average 86.63%


Page 186

Looking at figure 8.12 in greater detail, one can notice that in fact, as shown in table VIII.9, a

high overlapping between classes of white copper (in yellow colour) and brass (in green colour)

occurs that cause that over 15% of white copper be classified as brass. On the other hand, one can

notice the great scattering of stainless steel (in magenta colour), which causes errors in the

classification in the remainder of classes.

Figure 8.13 shows the feature vectors that are generated. An in-depth analysis of the results

shows that each of the materials has a similar signature. Also, the separability between the

different materials has increased. This phenomenon coincides with the results of table VIII.9 and

scattering graph 8.12.

Figure 8.13 Spectral-spatial scattering (fuzzy histograms) between the different materials (blue:

aluminium, copper: red, brass: green, lead: cyan, steel: magenta, white copper: yellow).

Figure 8.14 shows the results of this classification and the effects caused in the classification by

these overlappings.

Component

Intensity


Page 187

Figure 8.14 Classification of the diverse materials through the use of fuzzy sets and

neighbourhood histograms of size 19 x 19 pixels.

The results in 8.14 show a correct classification for most materials. It is noteworthy to mention

that the main cause for an incorrect classification is due to the overlapping between the classes of

stainless steel and white copper with the rest of the classes. Figure 8.14 shows that the majority of

the erroneously classified areas are connected with regions that are correctly classified. This leads

us to evaluate the use of techniques which will allow the re-classification and merging of regions

that are interconnected in order to obtain a higher precision in the classification.

6. Methods of region merging

Those regions identified after the application of neighbourhood histograms (see section 5) are

subject to the process of re-classification and aggregation of regions mentioned in section VII.4.

In this procedure, the cost of addition associated to each of the pairs of regions is calculated and

adequate aggregation decisions are taken based on it.

Section VII.4 proposes two methodologies for the implementation of this merging of regions. The

technique based on the calculation of the region of maximum likelihood uses the distances of each

of the elements of the region to a previously established region model in order to obtain the most

probable classification. The method based on the calculation of normalised region histograms

extracts a signature based on the histogram of different regions and analyses, based on this

signature, whether its grouping is suitable or not.


Page 188

Table VIII.10 shows that both methodologies offer high levels of precision, surpassing 98% of

reliability, which allows one to state that the application of the process of region merging reduces

to a great extent the number of erroneously classified regions.

On the other hand, one notices that the use of spectral fuzzification (XFUZZY SETS) continues to

provide better results than the use of principal components. This allows us to state that the

decorrelation based on the spectrum fuzzification appears to be the ideal decorrelation method

versus the decorrelation based on PCA.

Both region merging methodologies offer similar results that do not allow one to state which of

these methodologies has better behaviour in the classification of these materials. However, the

use of region histograms provides a signature available to directly analyse the region without

making an extraction of neighbourhood histograms for each of the points, which in certain

situations results in a better computation time.

The detailed results of this classification are shown in table VIII.11, calculated using spectral

fuzzy sets on a neighbourhood of 11 x 11. The obtained results have been reprocessed through

region of maximum likelihood techniques. The result of the application of this technique shows

that the overlapping levels have been considerably reduced. It only has errors in classification due

to the great scattering that is present in the class that defines stainless steel, as shown in figure

8.12.

TABLE VIII.10 CLASSIFICATION RATES THROUGH THE APPLICATION OF FUZZY REGION METHODOLOGIES

Window size used for the previous

calculation of regions Region of maximum likelihood Normalised region histogram

8-PCA 8-Fuzzysets 8-PCA 8-Fuzzysets

3x3 86.16% 96.92% 75.16% 96.92% 5x5 93.34% 97.42% 93.17% 96.67% 7×7 94.44% 97.52% 94.44% 98.36% 11×11 95.55% 98.47% 92.84% 98.36%

15×15 94.57% 98.20% 92.59% 98.36%

19×19 94.13% 96.78% 92.89% 96.94% 23×23 91.93% 96.96% 92.51% 96.94%


Page 189

The final results of the classification are shown in figure 8.15 where the effects of the processes

of region merging can be observed.

TABLE VIII.11 CONFUSION MATRIX FOR THE CLASSIFICATION OF MATERIALS THROUGH THE USE OF SPECTRAL-SPATIAL

INFORMATION FROM SPECTRAL FUZZY SETS OF 8 COMPONENTS AND A NEIGHBOURHOOD OF 11 X11 AND REGION MERGING BASED ON MAXIMUM LIKELIHOOD

Real Material

Classified


White Copper

Aluminium 97.61% 0.00% 0.03% 0.00% 0.00% 0.00%

Copper 0.00% 98.39% 0.00% 0.00% 0.00% 0.00%

Brass 0.00% 0.00% 97.51% 0.00% 0.00% 0.00%

Lead 0.00% 0.00% 0.00% 98.89% 0.00% 0.00%

Stainless Steel

2.39% 1.61% 2.45% 1.11% 98.41% 0.00%

White Copper

0.00% 0.00% 0.00% 0.00% 1.59% 100.00%

Average 98.47%


Page 190

Figure 8.15 Classification of several materials through the use of fuzzy sets and neighbourhood

histograms of size 11 x 11 pixels and the later reclassification based on the calculation of the

region of maximum likelihood.

Figure 8.15 shows the correct classification of the majority of regions. However, the effect caused

by the dispersion of aluminium continues to be seen and is what causes all the erroneous

classifications.

Despite this, spectacular results are achieved in the classification of materials, with results of over

98%, which validate the use of region merging criteria in order to improve in the classification.

7. Conclusions

This chapter has verified, based on experimental results, the theoretical hypothesis proposed in

earlier chapters.


Page 191

First, a computationally efficient methodology that subtracts the image background with great

precision has been established, thus allowing for an effective background segmentation.

On the other hand, a selection has been made of that correction of lighting technique which

behaves most efficiently in order to reduce the effects of the variation in lighting and the

specularity that is inherent to these materials. In this manner, a reduction is obtained, to a great

extent, of the effects caused by shine, areas of shadow and intensity of the spectrum, and thus

increasing the rate of classification.

Furthermore, as mentioned in Chapters IV and V, we have verified that the decorrelation of the

spectrum based on fuzzy sets as proposed in Chapter V obtains better results than the use of

classical techniques, such as those based on the principal components analysis and others.

Additionally, the use of these fuzzy sets for spectral decorrelation does not raise the problems of

traditional methods regarding both the need for a correct training as well as problems associated

with the loss of representation, discriminating power and chaotic behaviour when new non-

trained materials are introduced to the system.

The statistical overlapping of these analysed materials has been resolved by means of the

incorporation of spatial information to the spectral feature vector through the use of fuzzy

neighbourhood histograms proposed in Chapter VI, thus considerably increasing the separability

of different materials.

It also shows that the combination of spectral and spatial information in a simultaneous manner is

necessary. The use of only spectral information does not obtain correct classification rates over

72%. In the same manner, the use of spatial information on the colour features of the image (RGB

components) does not reach 60%. However, the combined use of both methods obtain

classification rates of over 86%.

Last, we have verified that the methodology of reclassification and merging of regions, proposed

in Chapter VII reduces the number of erroneously classified regions based on the calculation of

the criteria of cost of unification of regions, obtaining a success rate greater than 98%. Figure

8.16 shows in detail the final classification obtained for the set of data shown in this document.


Page 192

Figure 8.16 Classification of diverse materials through the use of fuzzy sets and neighbourhood

histograms of size 7 x 7 pixels and the later reclassification based on the calculation of

normalised region histograms.

This figure shows that the existing remaining erroneous classification is due to a high degree of

existing dispersion in the stainless steel model. This fact highlights the need to work in the study

and development of a more advanced model which will allow a more precise estimate of the

stainless steel model.

In summary, in this chapter, a high number of experiments have been performed that have

demonstrated that the use of spectral or spatial techniques in an independent manner do not obtain

an adequate model for the classification of these materials that are being studied. This makes the

joint use of these techniques necessary. Furthermore, the improvement in the precision of the

system based on different methodologies proposed in earlier chapters has been measured. In this

manner, the initial rate of classification, of 43.83% has increased to 98.47% through the use of the

proposed methodologies.

Page 193

Chapter IX

Conclusions, contributions and future work

Chapter IX Conclusions, contributions and future work

Page 194

In this Thesis work, different key aspects related with the characterisation, segmentation and

classification of hyperspectral images have been dealt with.

The studies undertaken in different sections have led to specific conclusions for each of the

matters dealt with. Based on the acquired knowledge in these researches, this chapter sets out a

series of general conclusions that summarise the work undertaken in this Thesis.

The in-depth study of each of the themes that make up this Thesis, a review of the state of art and

of the physical properties that underlie the creation of hyperspectral images have allowed to

delimit and improve certain aspects where conventional techniques did not produce the desired

results. This has led to new approaches for tackling existing problems. These new approaches

validated both theoretically and experimentally, have led to a series of contributions that shall be

listed in this chapter.

Additionally, during the development and evaluation of the different proposed solutions,

numerous areas of work have been opened that have not been analysed in depth in this Thesis.

This has mainly been done in order to avoid excessive dispersion in the themes we have dealt

with. The work themes presented in this Thesis are the starting point for new lines of research

which shall be object of further research. The future works are listed in a condensed form at the

end of this chapter.

1. Conclusions

First, there has been a verification that the proposed methodology leads to a spectacular

improvement in the ratios of classification versus traditional techniques by obtaining an

improvement of over 100% in the ratios of classification when these methods are used in the

context of the classification of metallic materials. It also reduces the error obtained by traditional

techniques (greater than 56%) in a considerable manner through the application of the proposed

methodology (less than 2%).

This methodology not only classifies diverse materials, but also constitutes a theoretical

framework that integrates spectral and spatial features in a sole mathematical descriptor for the


Page 195

characterisation of elements and regions contained in hyperspectral images, irrespective of their

nature.

The use of this spectral-spatial descriptor has been verified to achieve classification rates (87%

without applying merger of region techniques) that are much greater than those obtained through

the use of spatial techniques based on RGB colour (60%) and those obtained through the isolated

use of the spectral information of a pixel without taking spatial information into account (56%).

The use of a bioinspired method of feature extraction based on fuzzy sets (hyperspectral eye)

improves, by taking advantage of the correlation between near bands, the compression and

extraction of spectral descriptors. This allows for a better spectral characterisation that

simultaneously avoids the Hughes Phenomenon. This way, the use of this technique, when

applied to the classification of metals, obtains classification results of 72% using only spectral

information versus 56% obtained by using raw spectrum or 68% obtained through the use of PCA

or other classical methods of decorrelation.

The use of Gaussian models for the definition of materials based on extracted descriptors has an

efficient behaviour that directly improves the classification. Furthermore, due to the simplicity

and mathematical properties of these Gaussian models, this has led to the development of a

reclassification method of regions that increases the precision of the system in a very efficient

manner (98%).

In order to allow a better and easier in-depth analysis of the results of this Thesis, the following

provides the different conclusions reached in each of its chapters.

First, Chapter II lists the different types of existing images, including hyperspectral images, as

well as the physical properties that intervene in their creation.

On the other hand, Chapter III analyses the classical metrics used for the quantification of

dissimilarities between two given spectra. This analysis shows that these metrics efficiently

analyse the global changes in the spectrum; however, numerous times, it does not correctly

capture the discriminating features between the different classes. Due to this, the different

approaches for the extraction of descriptors in high-dimensional data are analysed in chapter IV.

There has also been an analysis of the state of the art methods which reduce the Hughes


Page 196

Phenomenon while keeping the discriminating information between the different clases. These

methods model those local features that ease its discrimination.

The analysed methods (PCA, LDA...) that allow a more efficient correction of the Hughes

Phenomenon simultaneously keep the discriminating power between different clases. However,

they also have a set of associated disadvantages. This way, the need for a previous training for its

calculation and the distortion produced in the physical meaning of the transformed variables

makes the analysis of the extracted descriptors more difficult. Additionally, the changes in the

composition of classes to be analysed or the addition of new classes to the system cause changes

in the optimal feature subspace, making it unsuitable and requiring a retraining that causes

uncontrollable modifications in the descriptors that are defined by the previous subspace.

The analysis of the limitations of these and other methods are set out in Chapter IV. Next,

Chapter V presents the requirements that must be complied with by the extracted descriptors.

These requisites imply the reduction of the Hughes Phenomenon, a universality and non-

dependence on the set of training data as well as a theoretical-physical basis that upholds the

discriminating power of the desired optimal descriptor.

Taking as a basis the previous requirements, Chapter V additionally proposes a methodology

based in the fuzzification of the spectrum. This method, bioinspired in the functioning of the

cones of the human visual system, takes advantage of the high correlation between adjacent bands

of the spectrum in order to define a method of feature extraction based on the Energy of the

spectral fuzzy sets.

In fact, in Chapter VIII, verifies through experimentation the better performance of this method

versus those methods of feature extraction that were described in Chapter IV. Additionally,

confirmation is provided of the fact that the fuzzification of the spectrum more efficiently retains

associated spatial information due to its similarity with the human visual system. Nonetheless,

despite the better performance of the proposed method of feature extraction, the discriminating

power of the spectral vector does not obtain an adequate rate of classification by itself. On the

other hand, in Chapter VI, a methodology is proposed that includes spectral features of an image

with spatial features in a sole feature vector through the use of fuzzy neighbourhood histograms

and fuzzy region histograms.


Page 197

Moreover, in Chapter VI, we prove that this combination of spectral and spatial features in a sole

vector provides greater separability for statistically overlapped classes. Furthermore, the best

classification results are provided by the combination of the method based on the Energy of fuzzy

spectral histograms with fuzzy spatial histograms.

Chapter VII offers a global theoretical vision of the complete process that allows the optimal

classification of different materials from hyperspectral images. This chapter analyses the problem

of the dependence on material chromacity versus external factors of illumination and geometry,

proposing different options that reduce this dependence on non-ideal conditions. On the other

hand, the techniques of decorrelation proposed in Chapters IV and V are integrated in the process

of classification together with the methodologies for the integration of spatial and spectral

features proposed in Chapter VI. This integration makes the definition of a model for each

material based on a Gaussian distribution possible. This not only evaluates the goodness of the

different proposed methods of classification, but also creates a mathematically simple model for

the classification of materials that can be easily analysed and which is extremely robust.

Chapter VII shows the usefulness of the employment of a later re-classification based on the

integration of connected regions based on the optimisation of a unification cost function. This

unifies, through the analysis of Gaussian distribution of materials, physically connected regions

that have been erroneously assigned to different materials.

Finally, Chapter VIII verifies via experiments, the proposed methodologies in earlier chapters and

validates the hypotheses made.

2. Contributions

This work actively contributes to the advance of the state of the art in the field of processing and

extraction of discriminant features from hyperspectral images. Specifically, four main

contributions have been made:

A) The definition of a generic framework that can classify based on information contained in

hyperspectral images and which covers the whole process, integrating spectral and spatial

features in a simultaneous manner, optimising the extraction of the discriminating spectral


Page 198

features based on a bioinspired model and creating a region model that reclassify the

erroneously classified regions.

B) A classification system for metallic materials based on the previous generic framework, which

is an advance in the current techniques of state of the art in the classification of materials.

C) A spectral-spatial descriptor that capture the spectral and spatial features in a sole feature

vector, thus more effectively characterising the content of hyperspectral images.

D) The definition and development of a system bioinspired on the human eye that processes a

spectrum through fuzzy logic techniques, allowing for a better extraction of the discriminating

features contained in that spectrum.

In order to describe in detail the different contributions of this work in a more exhaustive manner

a detailed list of the above is going to be provided in the following paragraphs.

First, Chapters II and III offer a global vision of the principles of the generation of hyperspectral

images as well as a review of the difficulties involved in the classification of materials based on

hyperspectral images. It also provides a detailed state of the art on the different classification

techniques. The following are the main contributions:

i. Conceptual review of the physical basis of the formation of hyperspectral images.

ii. In-depth study of the different problems associated with the classification of materials.

iii. Presentation of the classical methodology of classification for hyperspectral images.

iv. Review of the classical metrics for the differentiation of different spectra.

Chapter IV analyses the problems inherent to the extraction of discriminating features that define

the spectral vectors. The following contribution is provided:

v. Analysis of the problems associated to classical methods for feature extraction.

On the other hand, Chapter V provides an innovative method for the extraction of spectral

features that is bioinspired in the human visual system:


Page 199

vi. Listing of the properties that must be met by an optimal feature vector: requirements of

dimensionality reduction, discriminatory power, non-dependence on the sample set,

based on differing physical properties, universality, generality and physical maintenance

of the variable.

vii. Description of the fuzzy spectrum concept as an ideal element for the extraction of

spectral features for the separation of materials in conformity with the earlier contribution

that is conceptually inspired in the human visual system.

viii. Mathematical definition of the Energy of the fuzzy set as a characteristic measure that

allows for the ideal discrimination between different spectra.

ix. Expansion from the fuzzy set concept to multi-frequency fuzzy set in order to permit the

extraction of discriminating information that belongs to several frequencies.

In turn, Chapter VI is dedicated to the integration of spectral and spatial features, and offers the

following contributions:

x. Integration of spectral and spatial features in a sole feature vector through the use of

spatial histograms.

xi. Definition of the quantization of the fuzzy histogram that reduces the problems associated

with classical discretisation of histograms.

xii. Definition of fuzzy neighbourhood histograms and of region which characterise the

spectral-spatial information associated to a point and its neighbourhood or to a region of

the image.

On the other hand, Chapter VII integrates earlier methodologies in a complete framework for the

classification of materials, the following being its main contributions:

xiii. Global description of the complete process for classification of materials.

xiv. Study of the influence of lighting in the classification process.

xv. Integration of the contributions vii-xii within the process of classification of materials.

xvi. Mathematical description of the model of materials based on a Gaussian statistical

model.

xvii. Description of a process of merging of regions based on their Gaussian statistics based

on the fuzzy region histogram.


Page 200

Finally, the following are the contributions set out in Chapter VIII on the results obtained and the

description of the validation samples:

xviii. Experimental verification of the efficiency of the different proposed methodologies for

background subtraction.

xix. Experimental verification of the influence of the method of lighting correction in the

classification.

xx. Experimental verification of the inherent advantages of the use of fuzzy spectra for the

extraction of spectral features versus classical techniques, including a more precise

spatial representation.

xxi. Experimental verification of the improvement in the separability of different materials

that happens when integrating spectral and spatial features.

xxii. Experimental verification of the improvement in the classification caused by the use of

region merging techniques.

xxiii. Experimental verification of the validity of the proposed classification framework.

Taking the earlier into account, it is noteworthy that contributions i, ii, v, vii, viii, x-xii, xiii, xv-

xvii y xx-xxiii are pending acceptance in the scientific journal IEEE Transactions on Fuzzy

Systems in an exhaustive article entitled: “Spectral and Spatial Feature Integration for

Classification of Non-ferrous Materials in Hyper-spectral Data”.

At the same time, further development has been made on the bioinspired algorithm for feature

extraction (hyperspectral eye) whose results are pending acceptance in an article entitled “Bio-

inspired Data Decorrelation Methodology for Hyperspectral Imaging”, presented to Letters on

Pattern Recognition, Elsevier.

Furthermore, both the methodology for the extraction of bio-inspired spectral features as well as

the methodology for the integration of spectral-spatial features have been object of requests for

grant of a European patent: (Methodology for modeling electromagnetic spectra , EU-Patent:

08380314) and (Methodology for integration of spectral and spatial features for material

classification EU-Patent: 08380315).


Page 201

3. Future work

The research hereby presented is aimed at providing contributions on a specific matter and could

not have an end, as any of the aspects which compose it are always capable of improvement.

This section lists those tasks that have not been undertaken in the scope of this Thesis and which

are being looked at now or in the near future.

At this time, within the European project SORMEN [SORM_06], an application of the results of

this Thesis is being used in a prototype for the classification of materials from waste electrical

and electronic equipment (WEEE). In order to do so, the conditions proper of the process are

being taken into account and which will transform and adapt the proposed classification

algorithm for its use in the recycling of these materials. Special emphasis is being placed on

factors such as the speed of the process and its algorithmic simplification, taking into account the

particularities of this specific application. This will give rise to an additional patent on the

recycling system as well as several scientific publications which will deal with the real time

algorithm of classification, among other matters.

Anyhow, both the techniques of feature extraction based on fuzzy sets, as well as those used for

the integration of spectral and spatial features can be improved through the use of multi-

frequency or multi-spatial approaches that include information of diverse frequencies or of

different sizes of neighbourhoods. This could lead to an additional increase in the separability.

However, the two most promising fields which are opened because of the results obtained in this

Thesis are related to the use of these methodologies in the field of image segmentation and its

application to other areas of knowledge.

In this manner, taking the Gaussian model as a basis that defines a spectral region, the aim is to

develop advanced segmentation techniques based on active contours and snakes [LEE_05] that

use the evolution of the Gaussian model of this region for the determination of its correct

segmentation both in two-dimensional as well as three-dimensional hyperspectral images.


Page 202

Lastly, given the generality of the proposed methodology, the aim is to apply these segmentation

and classification technologies to other fields, such as medical images based on functional

magnetic resonance for the segmentation and classification of regions of interest, biological

analysis, identification of materials, quality control, advanced segmentation and quality control of

fruit within different projects.

Page 203

References

[ANGE_99] “E. Angelopoulou et Al.”, “Spectral Gradient: A material Descriptor Invariant To Geometry and Incident

Illumination”, Proc. 7th IEEE Int. Conf. Computer Vision, pp. 861-867, 1999.

[ASTER_98] ASTER Spectral Library. http://speclib.jpl.nasa.gov. , California Institute of Technology.

[BAKE_98] Baker M.R, “Universal Approximation Theorem for Interval Neural Networks”, Reliable Computing,

Volume 4, Number 3, pp. 235-239(5), Springer, 1998.

[BAYE_76] Patent US3,971,065 (1976-07-20) Bryce E. Bayer Color imaging array.

[BELL_61] R.E. Bellman “Adaptive control processes”, Princeton University Press.

[BELL_95] A. Bell and T. Sejnowski, “An Information-Maximization Approach to Blind Separation,” Neural

Computation, vol. 7, pp. 1,004-1,034, 1995.

[BERE_07] A. Bereciartua, and J. Echazarra, “Sistema basado en identificación multiespectral para la separación de

metales no férricos en WEEE en logísitica inversa”, 1er Congreso de Logística y Gestión de la Cadena de Suministro,

2007.

[BERG_85] Berger, J.O. Statistical Decision Theory and Bayesian Analysis. Springer Verlag, New York, Second

Edition.. ISBN ISBN 0-387-96098-8 (1985).

[BISH_06] CHRISTOFER M. BISHOP, “PATTERN RECOGNITION AND MACHINE LEARNING”, Springer,

2006, ISBN-10: 0-387-31073-8.

[BISH_08]. Bishop C. and I.T. Nabney “Pattern Recognition and Machina Learning: A matlab Companion”. Springer,

2008.

[BISH_95] C.M. Bishop, Neural Networks for Pattern Recognition. Oxford: Clarendon Press, 1995.

[BLINN_77] Blinn, J.F., Models of Light reflection for Compter synthesized Pictures, Computer Graphics 11 (2):192-

198,1977.

[BURG_98] C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining and

Knowledge Discovery, vol. 2, no. 2, pp. 121-167, 1998.

[CE_02] Directive 2002/96/EC of the European Parliament and of the Council of 27 January 2003 on Waste Electrical

and Electronic Equipment (WEEE) - Joint declaration of the European Parliament, the Council and the Commission

relating to Article 9.

[CHAN_03] C.I. Chang, Hyperspectral Imaging: Techniques for Spectral Detection and Classification, Kluwer

Academic Publishers Group, ISBN:0-306-47483-5, 2003.

[CHAN_04] Chang, C.I.Hyperspectral Imaging: Techniques for Spectral Detection and Classification,2004, ISBN:0-

306-47483-2.

[CHENG_94] B. Cheng and D.M. Titterington, “Neural Networks: A Review from Statistical Perspective” Statistical

Science, vol. 9, no. 1, pp. 2- 54, 1994.

[CHERI_03] Cheriyadat A., Bruce L.M., “Why principal component analysis is not an appropriate feature extraction

method for Hyperspectral data”, IEEE Geoscience and Remote Sensing Symposium, IGARSS 2003

[CLAR_90] Clark, R.N., A.J. Gallagher, and G.A. Swayze, Material Absorption Band Depth Mapping of Imaging

Spectrometer Data Using a Complete Band Shape Least-Squares Fit with Library Reference Spectra, Proceedings of

the Second Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) Workshop. JPL Publication 90-54, 176-186,

1990

References

Page 204

[CLARK_93] Clark, R.N., G.A. Swayze, A.J. Gallagher, T.V.V. King, and W.M. Calvin, 1993, The U. S. Geological

Survey, Digital Spectral Library: Version 1: 0.2 to 3.0 microns, U.S. Geological Survey Open File Report 93-592, 1340

pages, http://speclab.cr.usgs.gov.

[CLARK_95] Clark, R.N. and Swayze, G.A., Mapping Minerals, Amorphous Materials, Environmental Materials,

Vegetation, Water, Ice and Snow, and Other Materials: The USGS Tricorder Algorithm. Summaries of the Fifth

Annual JPL Airborne Earth Science Workshop, January 23- 26, R.O. Green, Ed., JPL Publication 95-1, p. 39-40, 1995.

[COMO_94] P. Comon, “Independent Component Analysis, a New Concept? Signal Processing, vol. 36, no. 3, pp.

287-314, 1994.

[COVE_67] Cover, T. and P Hart, “Nearest neighbor pattern classification”, IEEE Transactions on Information Theory

IT-11,21-27 1967.

[COVE_74] T.M. Cover, “The Best Two Independent Measurements are not the Two Best”, IEEE Trans. Systems,

Man, and Cybernetics, vol. 4, pp. 116-117, 1974.

[COVE_77] T.M. Cover and J.M. Van Campenhout, “On the Possible Orderings in the Measurement Selection

Problem,” IEEE Trans. Systems, Man, and Cybernetics, vol. 7, no. 9, pp. 657-661, Sept. 1977.

[CUN_89] Y. Le Cun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, and L.D. Jackel,

“Backpropagation Applied to Handwritten Zip Code Recognition,” Neural Computation, vol. 1,pp. 541-551, 1989.

[DASA_91] Belur V. Dasarathy, “Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques”, 1991, ISBN

0-8186-8930-7.

[DEMP_77] Dempster et Al. “”Maximum likelihood from incomplete data via the EM algorithm”. Journal of the Royal

Statistical Society B39(1), 1-38. 1977

[DEVI_80] P. A. Devijver and J. Kittler, “On the edited nearest neighbor rule,” in Proc. 5th Int. Conf Pattern

Recogn., 1980, pp. 72-80.

[DEVI_82] P.A. Debijver and J. Kittler. “Pattern Recognition: A Statistical Approach”, London, Prentice-Hall, 1982.

[DJOU_97] A. Djouadi and E. Bouktache, “A Fast Algorithm for the Nearest Neighbor Classifier,” IEEE Trans.

Pattern Analysis and Machine Intelligence, vol. 19, no. 3, pp. 277-282, 1997.

[DREY_13] “J. L. E. Dreyer”, “Brahe, Tycho. Tychonis Brahe Dani Opera Omnia (in Latin). Vol 1-15” 1913-1929.

[DU_05] Du Peijun et Al. , „Error Analysis and Improvements of Spectral Angle Mapper (SAM) Model“, MIPPR

2005: SAR and Multispectral Image Processing, Proc of SPIE Vol. 6043, 60430L, (2005).

[DUDA_73] Duda, R. and Hart, P. (1973). “Pattern Classification and Scene Analysis”. John Wiley & Sons. ISBN 0-

471-22361-1. 1973

[EINS_1905] Einstein, A "Über einen die Erzeugung und Verwandlung des Lichtes betreffenden heuristischen

Gesichtspunkt (trad. Modelo heurístico de la creación y transformación de la luz)". Annalen der Physik 17: 132–148

[WIKI_08] Wikipedia contributors. Specular reflection. Wikipedia, The Free Encyclopedia. August 4, 2008, 11:27

UTC. Available at: http://en.wikipedia.org/w/index.php?title=Specular_reflection&oldid=229755522. Accessed August

13, 2008.

[FEATH_05] B.K. Feather, S.A. Fulkerson, J.H Jones, R.A. Reed, M. Simmons, D. Swann, W.E. Taylor, and L.S.

Bernstein, “Compression technique for plume hyperspectral images”, Algorithms and Technologies for Multispectral,

Hyperspectral and Ultraspectral Imagery XI, SPIE, 2005

[FLET_87] Fletcher R. “Practical methods of Optimization” (Secon ed.) Wiley.

[FREU_96] Y. Freund and R. Schapire, “Experiments with a New Boosting Algorithm,” Proc. 13th Int'l Conf. Machine

Learning, pp. 148-156, 1996.

References

Page 205

[FREU_98] Freund, Y. and Schapire, R. E. Large margin classification using the perceptron algorithm. In Proceedings

of the 11th Annual Conference on Computational Learning Theory, 1998.

[FRIE_87] J.H. Friedman, “Exploratory Projection Pursuit,º J. Am. Statistical Assoc., vol. 82, pp. 249-266, 1987.

[FUKU_83] K. Fukushima, S. Miyake, and T. Ito, “Neocognitron: A Neural Network Model for a Mechanism of

Visual Pattern Recognition”, IEEE Trans. Systems, Man, and Cybernetics, vol. 13, pp. 826-834, 1983.

[FUKU_84] K. Fukunaga and J. M. Mantock, “Nonparametric data reduction,” IEEE Trans. Pattern Anal. Machine

Intell., vol. PAMI-6, pp. 115-118, Jan. 1984.

[FUKU_89] K. Fukunaga and R. R. Hayes, “The reduced Parzen classifier,” IEEE Trans. Pattern Anal. Machine

Intell., vol. PAMI-11, pp. 42M25, Apr. 1989.

[FUKU_90] K. Fukunaga, “Introduction to Statistical Pattern Recognition, 2nd ed”. New York Academic, 1990.

[GAMB_04] P. Gamba et Al, “Exploitiing spectral and spatial information in hyperspectral urban data with high

resolution ”, IEEE Geosci. Remote Sensing vol 1 nº 4, pp 322326,2004.

[GAMB_07] Paolo Gamba, Antonio J. Plaza, Jon A. Benediktsson, Jocelyn Chanusshot, European perspectives in

hyperspectral data analisis, Recent Advances in Techniques for Hyperspectral Image Processing. Remote Sensing of

Environment, JCR 2007.

[GESI_06] A.J. Gesing, “ELVs: How they fit in the global material recycling system and with technologies developed

for production or recycling of other products and materials”, 6th International Automobile Recycling Congress,

Amsterdam, Netherlands, 2006

[GEUS_01] “Jal-Mark Geusebroek et Al.”, “Color Invariance”, IEEE Trans. Pattern Anal. Mach. Intell. Vol 23 Nº 12,

2001.

[GOME_02] Luis Gómez Chova, “Pattern Recognition Methods for Crop Classification from Hyperspectral Remote

Sensing Images”, Master Thesis, 2002.

[GONZ_08] R.C. Gonzalez, R.E.Woods, "Digital Image Processing, 3rd Edition, ISBN: 978-0-13-168728-8, Pearson,

2008

[GORB_07] A. Gorban, B. Kegl, D. Wunsch, and A. Zinovyev (Eds.), Principal Manifolds for Data Visualization and

Dimension Reduction, LNCS 58, Springer, ISBN 978-3-540-73749-0, 2007.

[GRAH_07] H. Grahn and P. Geladi (Eds.), Techniques and Applications of Hyperspectral Image Analysis, Wiley,

ISBN-10: 0-470-01086-X, 2007

[GUNT_82] Wyszecki, Günther; Stiles, W.S. (1982). Color Science: Concepts and Methods, Quantitative Data and

Formulae, 2nd ed., New York: Wiley Series in Pure and Applied Optics. ISBN 0-471-02106-7.

[HART_68] P. E. Hart, “The condensed nearest neighbor rule,’’ IEEE Trans. Inform Theory, vol. IT-14, pp. 515-

516, 1968.

[HAYK_99] S. Haykin, Neural Networks, “A Comprehensive Foundation. Second ed., Englewood Cliffs”, N.J.:

Prentice Hall, 1999.

[HEAL_99]G. Healey, and D. Slater, “Models and methods for automated material identification in hyperspectral

imagery acquired under unknown illumination and atmospheric conditions”, IEEE Transactions on Geoscience and

Remote Sensing, vol. 37, no. 6, pp. 2706-2717, 1999.

[HERT_91] Hertz, J. A. Krogh and R.G. Palmer, “Introduction to the theory of Neural Computation”, Addison Wesley,

1991.

[HOLL_03] Michael Hollas, Modern Spectroscopy, 4th Edition, ISBN: 978-0-470-84416-8, 2003

[HOTT_33] H HOTELLING, “Analysis of a complex of statistical variables into principal components”, Journal of

educational psychology 417-444,1933.

References

Page 206

[HUGH_68] G.F. Hughes, “On the Mean Accuraccy Of Statistical Pattern Recognizers”,“IEEE Transactions on

Information Theory” (14-1 55-63.), 1968

[JACO_91] R.A. Jacobset Al, “Adaptive Mixtures of Local Experts,” Neural Computation, vol. 3, pp. 79-87, 1991.

[JAIN_00] Anil K. Jain et Al. “Statistical Pattern Recognition: A Review”, IEEE Transactions on Pattern Analysis and

Machine Intelligence vol 22, Nº1 January 2000.

[JAIN_97] A.K. Jain and D. Zongker, “Feature Selection: Evaluation, Application, and Small Sample Performance,”

IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 2, pp. 153-158, Feb. 1997.

[KASH_86] S.K. Kachigan, “Statistical Analysis” Radius Press, New York.

[KEEN_07] Michael R. Keenan, “Multivariate Analysis of Spectral Images Composed of Count Data”, Techniques and

Applications of Hyperspectral Image Analysis: 2007, ISBN-10: 0-470-01086-X, 2007.

[KESH_04] N. Keshava, “Distance metrics and band selection in hyperspectral processing with application to material

classification and spectral libraries”, IEEE Transactions on Geoscience and Remote Sensing, vol. 42, no. 7, pp. 1552-

1565, 2004.

[KLIN_90] “Gudrun J. Klinker, Steven A. Shafer, Takeo Kanade”, “A Physical Approach to Color Image

Understanding”, International Journal of Computer Vision. 4, 7-38, 1990.

[KOHO_88] T. Kohonen, “Learning vector quantization,” Neural Net., vol. 1, pp. 303, 1988, Supplement 1

[KOHO_95] T. Kohonen, “Self-Organizing Maps. Springer Series in Information Sciences, vol. 30”, Berlin, 1995.

[KUAN_05] C.Y. Kuan, and G. Healey, “Band selection for recognition using moment invariants”, Algorithms and

Technologies for Multispectral, Hyperspectral and Ultraspectral Imagery XI, SPIE, 2005.

[KULL_87] S. Kullback (1987) The Kullback-Leibler distance, The American Statistician 41:340-341.

[KUTI_05] M. Kutila, J. Viitanen, and A. Vattulainen, “Scrap metal sorting with colour vision and inductive sensor

array”, Computational Intelligence for Modelling, Control and Automation, pp. 725-729, Vienna, Austria, 2005.

[KWON_99] H. Kwon, S.Z. Der, N.M. Nasrabadi, and H. Moon, “Use of hyperspectral imagery for material

classification in outdoor scenes”, SPIE Proceedings Series, Algorithms, Devices, and Systems for Optical Information

Processing III, vol. 3804, Denver, USA, pp. 104-115, 1999.

[LEE_05] C.P. Lee, W. Snyder, C. Wang, “Supervised Multispectral Image Segmentation using Active Contours”,

Proceedings of the 2005 IEEE International Conference on robotics and Automation, Barcelona, 2005.

[LEVR_96] L. Devroye et Al., “A probabilistic Theory of Pattern Recognition”. New York: Springer 1996.

[LOWE_91] D. Lowe and A.R. Webb, “Optimized Feature Extraction and the Bayes Decision in Feed-Forward

Classifier Networks,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, no. 4, pp. 355-264, Apr. 1991.

[MAGN_99] Magnus, J.R. and H. Neudecker, “Matrix differential Calculus with Applications in Statistics and

Econometrics”. Wiley 1999.

[MALO_86] “Maloney L.T. and Wandell B.A.”, “A Computational Model of Color Constancy”, journal of the Optical

Society of America”, Vol. 3 N1, 1986.

[MANO_04] D. Manolakis, and D. Marden, “Dimensionality reduction of hyperspectral imaging data using local

principal component transforms”, Algorithms and Technologies for Multispectral, Hyperspectral and Ultraspectral

Imagery X, SPIE, 2004.

[MATA_94] J. Matas, R. Marik, and J. Kittler. Illumination invariant colour recognition. In E. Hancock, editor, British

Machine Vision Conference. BMVA Press, 1994.

[MCLA_00] McLachlan G.J. and D. Peel “Finite Mixture Models”. Wiley, 2000.

[MCLA_88] McLachlan G.J. and K.E. Basford (1988) “Mixture Models: Inference and Applications to Closterins.

Marcel Dekker” 1988.

References

Page 207

[MCLA_97] McLachlan G.J. and T. Krishnan, “The EM algorithm and its extensions” Wiley, 1997.

[MERC_02] G. Mercier, and M. Lennon, “On the characterization of hyperspectral texture”, IEEE International

Geoscience and Remote Sensing Symposium (IGARSS '02), vol. 5, pp. 2584-2586, 2002.

[MINSK_69] Minsky M L and Papert S A Perceptrons (Cambridge, MA: MIT Press), 1969.

[MONT_05] "Montoliu, R. and Pla, F. and Klaren, A.C.", "Illumination Intensity, Object Geometry and Highlights

Invariance in Multispectral Imaging", IbPRIA05, pages I:36, 2005.

[MORR_76] Morrison D.F.“Multivariate Statistical Methods” 2nd ed. McGraw-Hill, New York, 1976.

[NARE_77] Narendra, P.M., Fukunaga, K. “A branch and Bound Algorithm for Feature Subset Selection”, IEEE

Transactions on Computers (26-9)_ 917-922, 1977.

[NOCE_99] Nocedal J. and S.J. Wright. “Numerical Optimization”, Springer, 1999.

[OEHL_95] K.L. Oehler and R.M. Gray, “Combining Image Compression and Classification Using Vector

Quantization, ”IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, no. 5, pp. 461-473”, 1995.

[PAI_07] Pai-Hui Hsu, Feature extraction of hyperspectral images using wavelet and matching pursuit, ISPRS Journal

of Photogrammetry and Remote SensingVolume 62, Issue 2, , Including Special Section: - Young Author Award, June

2007, Pages 78-92.

[PAL_01] A. Pal and S.K. Pal. “Pattern Recognition: Evolution of Methodologies and Data Mining”, “Pattern

Recognition. From Classical to Modern Approaches” , World Scientific, 2001

[PARZ_62] Parzen E. “On estimation of a probability density function and mode”, Ann. Math. Stat. 33, pp. 1065-1076.

1962

[PEAR_1901] “On lines and planes of closest fit to systems of points in space”. Philosophical Magazine(2) 559-

572,1901.

[PERK_05] S. Perkins, K. Edlund, D. Esch-Mosher, D. Eads, N. Harvey, and S. Brumby, “Genie Pro: Robust image

classification using shape, texture and spectral information”, Algorithms and Technologies for Multispectral,

Hyperspectral and Ultraspectral Imagery XI, SPIE, 2005.

[PLAZ_05] A Plaza t Al, “Dimensionality reduction and classificationj of hyperspectral image data using sequences of

extended morphological transformations”, IEEE Geosci. Remote Sensing vol 43 nº 3, pp 466479,2005.

[PUDI_94] P. Pudil, J. Novovicova, and J. Kittler, “Floating Search Methods in Feature Selection” Pattern Recognition

Letters, vol. 15, no. 11, pp. 1,119-1,125, 1994.

[RAJP_03] K.M. Rajpoot, and N.M. Rajpoot, “Wavelet based segmentation of hyperspectral colon tissue imagery”, 7th

International Multi Topic Conference (INMIC 2003), pp. 38-43, Islamabad, Pakistan, 2003.

[RAMA_05] B. Ramakrishna, J. Wang, C. Chang, A. Plaza, H. Ren, C.C. Chang, J.L. Jensen, and J.O. Jensen,

“Spectral/spatial hyperspectral image compression in conjunction with virtual dimensionality”, Algorithms and

Technologies for Multispectral, Hyperspectral and Ultraspectral Imagery XI, SPIE, 2005.

[RAUD_98] S. Raudys, “Evolution and Generalization of a Single Neuron; Single-layerPerceptron as Seven Statistical

Classifiers”, Neural Networks, vol 11, nº 2 pp- 283-296, 1998.

[RELL_02] G. Rellier, X. Descombes, J. Zerubia, and F. Falzon, “A Gauss-Markov Model for hyperspectral texture

analysis of urban areas”, 16th International Conference on Pattern Recognition (ICPR’02), vol. 1, pp. 692-695 2002.

[RICH _01] Austin Richards Alien Vision: exploring the electromagnetic spectrum with imaging technology SPIE, The

International Society for Optical Engineers, 2001.

[RIPL_96] B. Ripley, Pattern Recognition and Neural Networks. Cambridge, Mass.: Cambridge Univ. Press, 1996.

[ROHD_97] Robert A. Rohde. Image originally created for Global Warming Art. GNU Licence.

[ROJA_96] R. Rojas, “Neural Networks”, Spinger-Velag, Berlin, 1996.

References

Page 208

[ROSE_58] Rosenblatt, Frank, The Perceptron: A Probabilistic Model for Information Storage and Organization in the

Brain, Cornell Aeronautical Laboratory, Psychological Review, v65, No. 6, pp. 386-408. 1958.

[RUME_86] Rumelhart et Al. “Learning representations by back-propagating errors” Nature, 323, 533-536.

[SANG_98] Stephen J. Sangwine, Robin E. N. Horne, The Colour Image Processing Handbook, ISBN 0412806207

Springer 1998.

[SCHA_90] R.E. Schapire, “The Strength of Weak Learnability,” Machine Learning, vol. 5, pp. 197-227, 1990.

[SCHO_97] B. Scho È lkopf, “Support Vector Learning,” Ph.D. thesis, Technische Universita È t, Berlin, 1997.

[SCHO_98] B. Scho È lkopf, A. Smola, and K.R. Muller, “Nonlinear Component Analysis as a Kernel Eigenvalue

Problem”, Neural Computation, vol. 10, no. 5, pp. 1,299-1,319, 1998.

[SHAF_84] “Shafer, S. A”., “Using color to separate reflection components”. In J. Opt. Soc. Am. A, vol. 1, page 1248.

1984.

[SHAF_85] S. Shafer, “Using color to separate reflection components”, Color Research and Applications, vol. 10, pp.

210-218, 1985

[SHAK_05] Shakhnarovish, Darrell and Indyk, “Nearest-Neighbor Methods in Learning and Vision”, The MIT Press,

ISBN 0-262-19547-X, 2005.

[SLAT_99] D. Slater, and G. Healey, “Material classification for 3D objects in aerial hyperspectral images”, IEEE

Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’99), vol. 2, pp. 2262-2267, 1999.

[SNYM_05] Jan A. Snyman.“Practical Mathematical Optimization: An Introduction to Basic Optimization Theory and

Classical and New Gradient-Based AlgorithmsW. Springer Publishing. 2005.

[SOME_04] Self-Organizing Maps and Learning Vector Quantization for Feature Sequences, Somervuo and Kohonen.

2004

[SOMM_06] E.J. Sommer, C.E. Ross, and D.B. Spencer, Method and apparatus for sorting materials according to

relative composition, US Patent 7,099,433, 2006.

[SORM_06] SORMEN - Innovative Separation Method for Non Ferrous Metal Waste from Electric and Electronic

Equipment (WEEE) based on Multi- and Hyperspectral Identification project, Sixth Framework Programme Horizontal

Research Activities Involving SMES Co-Operative Research, 2006, http://www.sormen.org/

[SPEC_08] Specim Spectral Imaging Ltd. http://www.specim.fi/.

[SPEN_05] D.B. Spencer, “The high-speed identification and sorting of nonferrous scrap”, JOM Journal of the

Minerals, Metals and Materials Society, vol. 57, no. 4, pp. 46-51, 2005.

[STOC_99] “Harro Stockman and Theo Gevers”, “Detection and Classification of Hyper-Spectral Edges”, Proc. 10th

British Machine Vision Conf., pp. 643-651, 1999.

[TAN_04] “Robby T. Tan, Ko Nishino, Katsushi i”, “Separating Reflection Components Based on Chromaticity and

Noise Analysis”, IEEE Trans. Pattern Anal. Mach. Intell, Vol 26, Number 10, 2004.

[TATZ_05] P. Tatzer, M. Wolf, and T. Panner, “Industrial application for inline material sorting using hyperspectral

imaging in the NIR range”, Real-Time Imaging, vol. 11, no. 2, Spectral Imaging II, pp. 99-107, 2005

[TOMI_94] Shoji Tominaga, Dichromatic reflection models for a variety of materials, Color Research & Application

19,5 277-285, 1994.

[TRES_95] V. Tresp and M. Taniguchi, “Combining Estimators Using Non-Constant Weighting Functions,” Advances

in Neural Information Processing Systems, MIT Press, 1995.

[TSO_04] B. Tso, and R.C. Olsen, “Scene Classification Using Combined Spectral”, Textural and Contextual

Information, SPIE, ATMHUI X, 2004.

[VAPN_06] Vladimir Vapnik, S.Kotz "Estimation of Dependences Based on Empirical Data" Springer, 2006.

References

Page 209

[VAPN_98] V.N. Vapnik, Statistical Learning Theory. New York: John Wiley & Sons, 1998.

[WAHA_06] D.A. Wahab, A. Hussain, E. Scavino, M. Mustafa, and H. Basri, “Development of a prototype automated

sorting system for plastic recycling”, American Journal of Applied Sciences, vol. 3, no. 7, pp. 1924-1928, 2006.

[WANG_06] J. Wang, and C.I. Chang, “Independent component analysis-based dimensionality reduction with

applications in hyperspectral image analysis”, IEEE Transactions on Geoscience and Remote Sensing, vol. 44, no. 6,

pp. 1586-1600, 2006.

[WILL_04] C. Willis, “Hyperspectral image classification with limited training data samples using feature subspaces”,

Algorithms and Technologies for Multispectral, Hyperspectral and Ultraspectral Imagery X, SPIE, 2004.

[WILS_04] David B. Wilson, Red-Green-Blue model, Phys Rev. E Vol: 69.3, 2004.

[WIND_07] “Self-modeling Image Analyis with SIMPLISMA”, Willem Windig et Al. “Techniques and Applications

of Hyperspectral Image Analysis”, John Wiley & Sons, 2007.

[XIE_93] Q.B. Xie, C.A. Laszlo, and R.K. Ward, “Vector Quantization Technique for Nonparametric Classifier

Design”, “IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, no. 12, pp. 1,326-1,330”, 1993.

[YOSH_00] Ohno, Yoshi (Oct. 16-20 2000). "CIE Fundamentals for Color Measurements" in IS&T NIP16

Conference. Vancouver, Canada: 540-545.

[YUHA_92] Yuhas, R.H., Goetz, A. F. H., and Boardman, J. W., 1992, Discrimination between semi-arid landscape

endmembers using the spectral angle mapper (SAM) algorithm. In Summaries of the Third Annual JPL Airborne

Geoscience Workshop, JPL Publication 92-14, vol. 1, pp. 147-149.

[ZADE_65] L.A. Zadeh, Fuzzy sets, Inform. and Control 8 338-353. (1965).

Documents

Classification of materials through the integration of ... · barreiatuak eraikiz, gizakien ikusizko sistemaren konoen funtzionamenduan oinarrituta. Multzo hauek espektroaren auzokide