[IEEE Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology - Darmstadt, Germany (14-17 Dec. 2003)] Proceedings of the 3rd IEEE International

A COMPARATIVE STUDY OF DENSITY MODELS FOR GAS IDENTIFICATION USING MICROELECTRONIC GAS SENSOR

~

Sofiane Brahim-Belhouari, Amine Bermak, Guangfen Wei and Philip C. H. Chan

Hong Kong University of Science and Technology, EEE Department, Clear Water Bay, Kowloon, Hong Kong.

e-mail : [email protected], [email protected], [email protected], [email protected]

ABSTRACT

The aim of this paper is to compare the accuracy of a range of advanced density models for gas identification from sensor array signals. Density estimation is applied in the con- struction of classifiers through the use of Bayes rule. Exper- iments on real sensors' data proved the effectiveness of the approach with an excellent classification performance. We compare the classification accuracy of four density models, Gaussian mixture models, Generative topographic mapping, Probabilistic PCA mixture and K Nearest Neighbors. On our gas sensors data, the best performance was achieved by Gaussian mixture models.

1. INTRODUCTION

During the last decade, increasing attention has been given to the development of Microelectronic gas sensors. Among various types of sensors, the micro-hotplate based SnO2 thin film sensors offer a number of interesting features and are particularly attractive for their practical interest [I] . In- deed, these devices feature high sensitivity, lower power consumption, as well as compactness and compatibility with semiconductor technology. Unfortunately, thin film sensors, (as do all gas sensors) suffer from a number of shortcomings such as non selectivity and nonlinearities of the sensor's response. Pattern recognition algorithms combined with gas sensor array have been traditionally used to address these issues [21. In fact, gas sensur array permits to improve the selectivity of a single gas sensor, and shows the ability to classify different odors. An array of different gas sensors is used to generate a unique signature for each odor. After a preprocessing stage, the resulting feature vector is used to solve a given classification problem, which cohsists of iden- tifying an unknown sample as one from a set of previously learned gases. Significant work has been devoted to design a successful pattern analysis system for machine olfaction [2]. Various kinds of flexible pattern recognition algorithms have been used for classifying chemical sensor data. Most notably neural networks have been exploited, in particular

multilayer perceptrons (MLP), radial basis functions (RBF) and self organizing maps (SOM) [2]. Other methods based on the class-conditional density estimation have been used, such as quadratic and K nearest neighbors (KNN) classifiers. Recently, a new family of semiparametric method based on mixture distributions, have been successfully applied for a number of applications such as speech recognition [3] and image retrieval [4]. Despite their great potential as classifiers, density models based approaches have not been exploited for machine olfaction and electronic nose applications. In this paper we will present a gas classification approach based on class-conditional density estimation using different density models. The ability of the proposed models to perform gas identification will be compared using an experimentally obtained dataset. Data collected from the microelectronic gas sensor constituted of a VLSI chip in- cluding 8 sensors would first undergo a preprocessing stage before being fed to the density-model classifier. A total of four density model operating on PCA, Neuroscale and LDA projections will be compared. Gaussian mixture model is shown to outperform probabilistic PCA mixture, K" and generative topographic mapping classifiers. LDA projection, by generating a good class separation, is shown to improve the classification accuracy.

,; d

2. DENSITY MODELS

The objective of pattern recognition is to set a decision rule, which partitions, in an optimized way, the data space into c regions, one for each class Ck. A pattern classifier generates a class label for an unknown feature vector I E Rd from a discrete set of previously learned classes. The most general classification approach is to use the posterior probability of class membership p(Ck 11). To minimize the probability of miss-classification one should consider the maximum a posterior rule and assign I to class Ch:

138

where p(r1Ck) is the class-conditional density and p(Ck) is the prior probability. In the absence of prior knowledge, p(Ck) can be approximated by the relative frequency of ex- amples in the dataset. One way to build a classifier is to estimate the class-conditional densities by using represen- tation models for how each pattern class populates the feature space. In this approach, classifier systems are built by considering each of the class in turn, and estimating the cor- responding class-conditional densities p(z1Ck) from data.

The most widely used nonparametric density estimation metho is the K Nearest Neighbors (K"). Despite the simplicity ofthe algorithm, it often performs very well and is an impor- tant benchmark method. However, one drawback of K" is that all the training data must he stored, and a large amount of processing is needed to evaluate the density for a new input pattem. An alternative is to combine the advantages of both parametric and nonparametric methods, by allowing a very general class of functional forms in which the num- her of adaptive parameters can be increased to build more flexible models. This leads us to a powerful technique for density estimation, called mixture model [ 5 ] . In our work we focus on semiparametric models based on mixture distributions. We present briefly three density models namely: (i) Gaussian mixture models, (ii) generative topographic mapping and (iii) probabilistic PCA mixture.

-- -,

2.1. Gaussian mixture models

In a Gaussian mixture model, a probability density function is expressed as a linear combination of basis functions. A model with 11.1 components is described as mixture distribution (51:

c,

M

= P(j)P(Zlj) ( 2 ) j = l

where p(j) are the mixing coefficients and the parameters ofthecomponentdensity functions p(zlj) vary with j . Each mixture component is defined by a Gaussian parametric distribution in d dimensional space:

i

The parameters to he estimated are the mixing coefficients p(j). the covariance matrix C j and the mean vector p j . The method for training mixture model is based on maxi- mizing the data likelihood. The log likelihood of the dataset (11, ..., zn), which is treated as an error, is defined by:

(3)

A specialized method is commonly used to produce opti- mum parameters, known as the expectation-maximization (EM) algorithm [6].

2.2. Generative Topographic mapping

In many classification problems we have to deal with high dimensional data. Therefore we would like to model the distribution p(z) parameterized hy latent variables I in low dimensional space. After estimating p ( ~ [ % ) , the dependence on x has to he integrated out to obtain the density in data space p(z), where:

m(z) = / p(4xMz)dz (4)

The Generative topographic mapping (GTM) [7] is one of the more popular methods for dealing with this situation. It is a mixture model, which means equation (4) is approximated by a sum over M Gaussians:

p(+) is assumed to be a uniform distribution and each mixture component is a spherical Gaussian with variance U',

and the j t h centre is given by a parameterized mapping y(xj, W ) . Equation (5) can be rewritten as:

It is a constrained mixture model because the centers are not independent hut are related by the mapping y. In GTM method, the mapping from x to z is modeled with a RBF

y(x, W ) = W*(r) (6)

where 0 ( x ) are K fixed basis functions a:(+) and W is a d x K matrix of the adjustable network weights. The log likelihood for a dataset is then maximized in terms of W and U using an expectation-maximization (EM) algorithm.

2.3. Probabilistic PCA mixture

Classical PCA is made into a density model by using a latent variable approach, in which the data z is generated by a linear combination of number of variables z of low dimension (q < d). The mapping from x to x is given by:

y ( r , W ) = w z + p (7)

p represents the data mean. The probability model of PCA can be written as a combination of the conditional distribution [SI:

139

and the latent variable distribution:

(9)

By integrating out the latent variables z, we obtain the distribution of the observed data, which is also Gaussian:

1

p)Tc-'(z -

where C = WWT + a21. The covariance matrix is the sum of two terms: one is diagonal in a q-dimensional sub- space spanned by the first q principal components and the other is spherical. A mixture of PPCA has the same form as (2), where each component density function is given by a probabilistic PCA. Hence, the training of such a model can be done in the maximum likelihood framework using an EM algorithm.

3. EXPERIMENTAL RESULTS

The experimental setup used in our work is shown in Figure 1. It consists of a special sensor chamber equipped with gas pumps and mass flow controllers (MFC) as well as a data acquisition board (DAQ).

Fig. 1. Scheme of the experimental setup. The gas sensor array is placed inside a gas chamher in which the gas concentration is set using Mass Flow Controller (MFC). Data are acquired using a Data Acquisition board (DAQ)

Methane, carbon monoxide or their mixture vapours were injected into the gas chamber at a flow rate determined and accurately controlled by the mass flow controllers (MFC). Different concentrations were used for CO ranging from 25 to 200 ppm and for CH4 ranging from 500 to 4000 ppm. The CO-CH4 mixture concentration was also injected into the gas chamber, with different combination of both gases' concentration. A sensor array composed of 8 micro-hotplate based Sn02 thin film gas sensors, has been used. The sensor outputs are raw voltage measuremenls in the form of exponential-likecurves. The steady state value was recorded for each concentration of the three gases. Figure 2 shows the typical steady state response for the sensor array exposed to different gases. A gas dataset of 168 patterns was used for estimating the performance of the density classifiers. For feature extraction, we used Principal Component Anal- ysis (PCA), and compared it with Linear Discriminant Anal- ysis (LDA) and Neuroscale [9]. PCA, a classical linear tech-

I

0 8 , , , , , , ,

0 ,

r

\-

Fig. 2. Histagramshowing the response patterns aftheeight gas sensors exposed to CH4, CO and their mixture.

nique, is used as a preprocessing stage for redundancy re- moving and feature reduction, before applying the density model classifier. Figure 3(a) presents the PCA scores for all the studied gas sensors steady state voltage. We can note that the decision boundaries are not well defined due to a strong overlapping. As compared to PCA, Neuroscale projection is a non-linear topographic (i.e. distance preserving) projection that uses a RBF network. This method presents the advantage of preserving the data structure, as well as the possibility of incorporating a subjective information. In our case, Neuroscale was used with class information in order to generate a useful feature space that separate classes. Figure 3(b) shows the Neuroscale plots for the studied gases. Com- pared to PCA results, it is clear from Figure 3(h), that the Neuroscale method permits to considerably reduce the overlapping between the classes and hence shows better separa- bility performance. LDA provides a linear projection of the data with (c-1) dimensions, by taking into account the scat- ter of data within each class and across classes. Compared to PCA and Neuroscale results, LDA presents the most dis- criminatory projection as evidenced by Figure 3(c). Let's now study the classification capabilities of each density model and compare their performances. The inputs to each classifier are the projections of the data using PCA, LDA or Neuroscale. The parameters of each mixture model were adapted to the training data in the maximum likelihood framework using EM algorithm. Since the dataset we used was small, generalization performances were estimated by using the 8 fold cross validation approach. For PCA tool, the best performance is achieved when projecting to five principal components. The addition of components actually degrades the performance of the classifier. Table 1 reports the classification performance of the mixture models in comparison to the one of the K" (K = 3) classifier. GMM is shown to outperform PPCA, K" and GTM classifiers. It was also found that LDA projection improves

,:

classification accuracy for all density model classifiers. 5

140

~, #, -% ~.. ~d -= -,. IO .I.

(a.) (h.1 (c.)

Fig. 3. (a.) PCA re~ults for the gas sensor m a y steady state voltage. Measurement type. CO (circles), CH4 (squares) and mixture (tnanglcs). (b.) Data projected using Neumsc.de with class information. Measurement type, CO (circles), CH4 (squans) and mixture ((riangles). (c.) LDA results for the gas sensor m a y steady state voltage. Measurement type, CO (circles). CH4 (squares) and mixture (triangles).

89.88 91.10 91.10 93.45 90.47 90.47 89.28 91.67

Table 1. Classification performance expressed in terms of the 8 fold cross validation accuracy, using E A , Neuroscale and LDA projections.

- - 4. CONCLUDING REMARKS ~ - -

In this paper we presented a gas identification method based on class-conditional density estimation. We conducted a comparative study of four density models classifiers applied to the problem of classifying combustion gases. Data were obtained from an array of 8 Microelectronic gas sensors using an experimental setup. It was found that GMM offers the best performance as compared to GTM, PPCA mixture and K”. Experiments also showed that LDA projection enhances the classification performance of all tested density models classifiers. It was also found that the performance of density models are greatly affected by the number of principal components used. Work is underway in order to extend the experimental setup to include more combustion gases as well as their mixtures. More discriminant classifiers will be designed in order to classify and even quantify the different gases.

Acknowledgments

‘a The work described in this paper was supported by a Direct Allocation Grant (project No. DAG02/03.EG05).

5. REFERENCES

[ l ] P. C.H. Chan, G. Yan, L. Sheng, R. K. Sharma, Z. Tang, J.K.O. Sin, I-M. Hsing, Y. Wang, “An integrated gas sensor technology using surface micro- machining,” Sensors and Actuators B 82, pp. 277-283, 2002.

[2] R. Gutierrer-Osuna, “Pattern analysis for machine olfaction: a review,” IEEE Sensors Journal, Vol. 2, NO. 3, pp. 189-202,2002.

[3] Y. Zhang, M. Alder and R. Togneri, “Using Gaus- sian mixture modeling in speech recognition,” In Proc. ICASSPConf., 1994, Vol. l ,pp. 613-616.

[4] N. Vasconcelos and A. Lippman, “Feature representa- tions for image retrieval: beyond the color histogram,” In Proc. IEEE ICME Conf., 2000, Vol. 2, pp. 899-902.

[ 5 ] D.M. Titrerington, A.F.M. Smith and U.E. Makov, “Statistical analysis of finite mixture distributions,” John Wiley, New York, 1985.

[6] C. M. Bishop, ‘‘Neural networks for pattem recognition,” Clarendon press, Oxford, 1995.

171 C. M. Bishop, M. Svensen, and C.K.I. William, “GTM: The Generative topographic mapping,” Neural Computation, IO ( I ) , pp. 215-235, 1996.

[8] M. E. Tipping and C. M. Bishop, “Probabilistic principal component analysis,” J. Roy. Statist. SOC, B 61, pp. 611-622, 1999.

[9] D. Lowe and M. E. Tipping, “Feed-forward neural networks and topographic mappings for exploratory data analysis,” Neural Computing and Applications, 4, pp. 83-95, 1996.

141

http://Neumsc.de

Documents

[IEEE Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology - Darmstadt, Germany (14-17 Dec. 2003)] Proceedings of the 3rd IEEE International