1
Fuzzy Encoding For Image Classification Using Gustafson-Kessel Aglorithm Ashish Gupta, Richard Bowden Centre for Vision, Speech, and Signal Processing, University of Surrey, Guildford, United Kingdom Abstract Feature vectors in visual descriptor vector space do not exist in naturally occurring clusters, but have been shown to exhibit visual ambiguity. ‘Bag-of-Features’ (BoF), the most popular approach, which utilizes hard partitioning, ignores the existence of a semantic continuum in this space. Kernel Codebooks have been demonstrated to improve upon BoF, by soft-partitioning descriptor space. Building on these results, this paper, to model the visual ambiguity, formulates feature encoding, as clustering with fuzzy logic. The Gustafson-Kessel algorithm, computes hyper-ellipsoid shaped clusters, each with learnt mean and co-variance. The approach is demonstrated to provide better classification performance than BoF for several popular data-sets. Visual Word Ambiguity Figure: Hard partitioning leads to: suitable (triangle); uncertain (square); and plausible (diamond) types of assignments. Image from [1] Figure: Kernel Codebook ameliorates issues with uncertainty and plausibility by weighted assignment to words. Image from [1] The feature descriptor space is a continuum, which dictates that the exclusive assignment of a descriptor to a cluster prototype - crisp logic - can be modelled better by soft-assignment of descriptor to multiple cluster prototypes [1]. Image Classification Pipeline The training data-set, for visual category ‘car’, consists of positive and negative labelled images. A sample of descriptors computed on these images is clustered using either K-Means (BoF), Fuzzy C-Means (FCM) [2], or Gustafson-Kessel (GK) [3]. The descriptors from each image are encoded according to weighted word assignment(s) to compute histogram feature vectors for each image. A classifier is trained using these feature vectors. Fuzzy Encoding: Methods K-means computes words V = {v 1 , v 2 ,..., v K } and assignment of descriptors X = {x 1 , x 2 ,..., x N } to words by minimizing: J (X; V)= K X i =1 N X k =1 1 i k k x k - v i k 2 , 1 i k = 1 if x k v i 0 otherwise The encoded feature for an image ν is a discrete valued histogram. The FCM algorithm attempts to minimize : J (X; U, V)= K X i =1 N X k =1 μ m ik k x k - v i k 2 , 1 m < where U =[μ ik ], and μ ik is degree of membership of x k to cluster i . m is a measure of ‘fuzzification’. ν reflects a degree of membership of descriptor to words. However, FCM in unable to adapt to local distribution of descriptors. The GK algorithm extends FCM, using a metric induced by a positive definite matrix A, learning hyper-ellipsoidal clusters instead of hyper-spherical clusters of FCM. The objective function minimized in GK is: J (X; U, V, A)= K X i =1 N X k =1 μ m ik D 2 ikA i where, D 2 ikA i =(x k - v i ) T A i (x k - v i ) , 1 i K , 1 k N Gustafson-Kessel Algorithm Algorithm 1 Gustafson-Kessel τ 1 repeat v (τ ) i N k =1 (μ (τ -1) ik ) m x k N k =1 (μ (τ -1) ik ) m F i N k =1 (μ (τ -1) ik ) m (x k -v (τ ) i )(x k -v (τ ) i ) T N k =1 (μ τ -1 ik ) m D 2 ikA i =(x k - v (τ ) i ) T [ρ i det (F i ) 1 n F -1 i ](x k - v (τ ) i ) φ k ←{i | D ik = 0} for k 1, N do if φ k = then μ (τ ) ik ( K j =1 ( D ikA i D jkA j ) 2 m-1 ) -1 else μ (τ ) ik ( 0 if D ikA i > 0 1 |φ k | if D ikA i = 0 τ τ + 1 until k U (τ ) - U (τ -1) k< for j 1, M do ν j μ 1 j k γ j 1 if I j ∈C -1 if I j / ∈C Notation: v i : centre of i th cluster F i : co-variance of i th cluster x k : k th descriptor μ ik : membership of x k to i th cluster D ikA i : inner product norm ρ i : det (A i ) m: measure of fuzzification M : no. of images C : category I j : j th image Experiments The comparative classification performance of BoF, FCM, and GK algorithms is analysed across visual categories in a data-set; across set of data-sets; and for different codebook size. The feature descriptor utilized in all experiments is the popular local affine co-variant descriptor SIFT. A classifier is SVM with RBF kernel. The datasets utilized are Caltech-101, Caltech-256, Pascal VOC 2006, Pascal VOC 2010, and Scene-15. These datasets vary in terms of number of categories, number of images within each category, visual domain of categories, inherent difficulty in modelling a category. Performance across categories The graphs in the figures show the comparative mean classification accuracy of BoF and GK approaches for each visual category in the datasets: VOC2006, VOC2010, and Scene15. The absolute and relative performance of both BoF and GK varies across the categories due to the variation in content and complexity of each category. Performance across datasets Comparison of the BoF, FCM, and GK approaches in terms of their classification performance for different datasets. The results in the figure show the mean accuracy averaged across all categories of the dataset. Performance across codebook sizes Analysis of comparative mean classification accuracy of BoF and GK approaches for different codebook sizes. Graph shows performance for Caltech101 dataset. Summary We have introduced fuzzy encoding technique for image classification using Fuzzy C-Means (FCM) to compute a fuzzy membership function. We extended this work to the Gustafson-Kessel (GK) fuzzy clustering algorithm, which was shown to adapt to local distributions. We demonstrated empirically that our fuzzy encoding approach is consistently better than the BoF model, using several popular datasets. GK algorithm was shown to provide a marginal improvement over FCM, which we expect to improve with optimization of covariance matrices in future. References J.C. van Gemert, C.J. Veenman, A.W.M. Smeulders, and J.-M. Geusebroek, “Visual word ambiguity,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 32, no. 7, pp. 1271 –1283, july 2010. A. Baraldi and P. Blonda, “A survey of fuzzy clustering algorithms for pattern recognition. i,” Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 29, no. 6, pp. 778 –785, dec 1999. Donald E. Gustafson and William C. Kessel, “Fuzzy clustering with a fuzzy covariance matrix,” in Decision and Control including the 17th Symposium on Adaptive Processes, 1978 IEEE Conference on, jan. 1978, vol. 17, pp. 761 –766. Acknowledgement This work is supported by the EPSRC project Making Sense (EP/H023135/1). Centre for Vision, Speech, and Signal Processing - University of Surrey - Guildford, United Kingdom Mail: [email protected] WWW: http://www.ee.surrey.ac.uk/cvssp

Fuzzy Encoding For Image Classification Using Gustafson-Kessel Aglorithm

Embed Size (px)

DESCRIPTION

This paper presents a novel adaptation of fuzzy clustering and feature encoding for image classification. Visual word ambiguity has recently been successfully modeled by kernel codebooks to provide improvement in classification performance over the standard ‘Bag-of-Features’(BoF) approach, which uses hard partitioning and crisp logic for assignment of features to visual words. Motivated by this progress we utilize fuzzy logic to model the ambiguity and combine it with clustering to discover fuzzy visual words. The feature descriptors of an image are encoded using the learned fuzzy membership function associated with each word. The codebook built using this fuzzy encoding technique is demonstrated to provide superior performance over BoF. We use the Gustafson- Kessel algorithm which is an improvement over Fuzzy CMeans clustering and can adapt to local distributions. We evaluate our approach on several popular datasets and demonstrate that it consistently provides superior performance to the BoF approach.

Citation preview

Page 1: Fuzzy Encoding For Image Classification Using Gustafson-Kessel Aglorithm

Fuzzy Encoding For Image Classification Using Gustafson-Kessel AglorithmAshish Gupta, Richard Bowden

Centre for Vision, Speech, and Signal Processing, University of Surrey, Guildford, United Kingdom

AbstractFeature vectors in visual descriptor vector space do not exist innaturally occurring clusters, but have been shown to exhibitvisual ambiguity. ‘Bag-of-Features’ (BoF), the most popularapproach, which utilizes hard partitioning, ignores the existenceof a semantic continuum in this space. Kernel Codebooks havebeen demonstrated to improve upon BoF, by soft-partitioningdescriptor space. Building on these results, this paper, to modelthe visual ambiguity, formulates feature encoding, as clusteringwith fuzzy logic. The Gustafson-Kessel algorithm, computeshyper-ellipsoid shaped clusters, each with learnt mean andco-variance. The approach is demonstrated to provide betterclassification performance than BoF for several populardata-sets.

Visual Word Ambiguity

Figure: Hard partitioning leads to: suitable(triangle); uncertain (square); and plausible(diamond) types of assignments. Image from [1]

Figure: Kernel Codebook amelioratesissues with uncertainty and plausibility byweighted assignment to words. Image from [1]

The feature descriptor space is a continuum, which dictates thatthe exclusive assignment of a descriptor to a cluster prototype -crisp logic - can be modelled better by soft-assignment ofdescriptor to multiple cluster prototypes [1].

Image Classification Pipeline

The training data-set, for visual category ‘car’, consists ofpositive and negative labelled images. A sample of descriptorscomputed on these images is clustered using either K-Means(BoF), Fuzzy C-Means (FCM) [2], or Gustafson-Kessel (GK) [3].The descriptors from each image are encoded according toweighted word assignment(s) to compute histogram featurevectors for each image. A classifier is trained using these featurevectors.

Fuzzy Encoding: Methods

K-means computes words V = {v1, v2, . . . , vK} and assignment ofdescriptors X = {x1, x2, . . . , xN} to words by minimizing:

J(X;V) =K∑

i=1

N∑k=1

1ik ‖ xk − vi ‖2 , 1i

k =

{1 if xk ∈ vi0 otherwise

The encoded feature for an image ν is a discrete valuedhistogram. The FCM algorithm attempts to minimize :

J(X;U,V) =K∑

i=1

N∑k=1

µmik ‖ xk − vi ‖2 , 1 ≤ m <∞

where U = [µik ], and µik is degree of membership of xk to cluster i .m is a measure of ‘fuzzification’. ν reflects a degree ofmembership of descriptor to words. However, FCM in unable toadapt to local distribution of descriptors. The GK algorithmextends FCM, using a metric induced by a positive definite matrixA, learning hyper-ellipsoidal clusters instead of hyper-sphericalclusters of FCM. The objective function minimized in GK is:

J(X;U,V,A) =K∑

i=1

N∑k=1

µmik D2

ikAi

where,

D2ikAi

= (xk − vi)T Ai(xk − vi) , 1 ≤ i ≤ K , 1 ≤ k ≤ N

Gustafson-Kessel Algorithm

Algorithm 1 Gustafson-Kessel

τ ← 1repeat

v (τ )i ←

∑Nk=1(µ

(τ−1)ik )mxk∑N

k=1(µ(τ−1)ik )m

Fi ←∑N

k=1(µ(τ−1)ik )m(xk−v (τ )

i )(xk−v (τ )i )T∑N

k=1(µτ−1ik )m

D2ikAi

= (xk − v (τ )i )T [ρidet(Fi)

1nF−1

i ](xk −

v (τ )i )φk ← {i | Dik = 0}for k ← 1,N do

if φk = ∅ then

µ(τ )ik ← (

∑Kj=1(

DikAiDjkAj

)2

m−1)−1

else

µ(τ )ik ←

{0 if DikAi

> 01|φk |

if DikAi= 0

τ ← τ + 1until ‖ U(τ ) − U(τ−1) ‖< εfor j ← 1,M do

νj ←∑µ1jk

γj ←{

1 if Ij ∈ C−1 if Ij /∈ C

Notation:vi: centre of i th

clusterFi: co-variance of i th

clusterxk : k th descriptorµik : membership ofxk to i th clusterDikAi

: inner productnormρi: det(Ai)m: measure offuzzificationM: no. of imagesC: categoryIj: j th image

Experiments

The comparative classification performance of BoF, FCM, and GKalgorithms is analysed across visual categories in a data-set;across set of data-sets; and for different codebook size. Thefeature descriptor utilized in all experiments is the popular localaffine co-variant descriptor SIFT. A classifier is SVM with RBFkernel. The datasets utilized are Caltech-101, Caltech-256, PascalVOC 2006, Pascal VOC 2010, and Scene-15. These datasets varyin terms of number of categories, number of images within eachcategory, visual domain of categories, inherent difficulty inmodelling a category.

Performance across categories

The graphs in the figures show the comparative meanclassification accuracy of BoF and GK approaches for each visualcategory in the datasets: VOC2006, VOC2010, and Scene15. Theabsolute and relative performance of both BoF and GK variesacross the categories due to the variation in content andcomplexity of each category.

Performance across datasetsComparison of the BoF, FCM, and GK approaches in terms oftheir classification performance for different datasets. The resultsin the figure show the mean accuracy averaged across allcategories of the dataset.

Performance across codebook sizesAnalysis of comparative mean classification accuracy of BoF andGK approaches for different codebook sizes. Graph showsperformance for Caltech101 dataset.

Summary

We have introduced fuzzy encoding technique for imageclassification using Fuzzy C-Means (FCM) to compute a fuzzymembership function. We extended this work to theGustafson-Kessel (GK) fuzzy clustering algorithm, which wasshown to adapt to local distributions. We demonstratedempirically that our fuzzy encoding approach is consistentlybetter than the BoF model, using several popular datasets. GKalgorithm was shown to provide a marginal improvement overFCM, which we expect to improve with optimization of covariancematrices in future.

ReferencesJ.C. van Gemert, C.J. Veenman, A.W.M. Smeulders, and J.-M. Geusebroek,“Visual word ambiguity,”Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 32, no. 7, pp. 1271 –1283, july 2010.

A. Baraldi and P. Blonda,“A survey of fuzzy clustering algorithms for pattern recognition. i,”Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 29, no. 6, pp. 778 –785, dec 1999.

Donald E. Gustafson and William C. Kessel,“Fuzzy clustering with a fuzzy covariance matrix,”in Decision and Control including the 17th Symposium on Adaptive Processes, 1978 IEEE Conference on, jan.1978, vol. 17, pp. 761 –766.

Acknowledgement

This work is supported by the EPSRC project Making Sense (EP/H023135/1).

Centre for Vision, Speech, and Signal Processing - University of Surrey - Guildford, United Kingdom Mail: [email protected] WWW: http://www.ee.surrey.ac.uk/cvssp