Download pdf - Iterative Technique for Content-Based Image Retrieval using ...iris.sel.eesc.usp.br/wvc/Anais_WVC2013/Oral/1/6.pdfIterative Technique for Content-Based Image Retrieval using Multiple

Iterative Technique for Content-Based ImageRetrieval using Multiple SVM Ensembles

Douglas Natan Meireles Cardoso,Dionei José Muller,Fellipe Alexandre,

Luiz Antônio Pereira Neves,Pedro Machado Guillen Trevisani.

Federal University of ParanáRua Dr. Alcides Vieira Arcoverde, 1225, Curitiba, PR, Brazil.

Gilson Antonio GiraldiNational Laboratory for Scientific Computing - LNCC

Av. Getulio Vargas, 333, Quitandinha, Petrópolis, RJ, Brazil.

Abstract—This paper applies Support Vector Machine (SVM)ensembles, based on the ”one-against-all” SVM multi-class ap-proach, for Content-Based Image Retrieval (CBIR). Given adatabase previously divided in N classes a first ensemble withN SVM machines is trained. Given a query image, this SVMensemble is used to find the candidate classes for the queryclassification. Next, a new ensemble is constructed with ”one-against-all” strategy in order to improve the target search. Theprocess stops when only one class is returned which completesthe query classification stage. This class will be used in thefinal step for image similarity computation and retrieval. Beforeconstructing the SVM ensembles we pre-process the images withthe Discrete Cosine Transform (DCT) for feature extraction. Wepresent the accuracy of our approach for Corel and Ground Truthimages databases and show that its average accuracy outperformsa related work.

I. INTRODUCTION

Nowadays, we observe a huge amount of images storedin electronic format particularly in the case of biological andmedical applications. Therefore, efficient content-based imageretrieval techniques (CBIR) become a fundamental require-ment for searching and retrieving images from a large digitalimage database [1].

CBIR is a class of techniques which uses visual contents(features) to search images from a database following a queryimage given by the user. In this process, visual contents suchas shape descriptors and color features are extracted from eachimage of the database. The same is done for the user’s requestin the form of a query image. Then, some engine is used forfeature comparison in order to get the target images; that is,the images of the database that are most similar to the queryone. The whole pipeline for CBIR can be roughly divided intothree modules [2]: (1) Feature extraction module; (2) Querymodule; (3) Retrieval module.

The first module includes techniques to convert an inputimage into a numerical array, normally called feature vector.The idea of this step is to obtain a more compact representationof the image. Therefore, the feature space has in general adimension less than the original image space. Feature spacesusually are general composed by shape features, color, texture,histogram, edge features and image transform features [3], [4].The latter encompasses linear operations, like Fourier, Sine andCosine transforms as well as Wavelet approaches [5], [6], [7].

The query module takes the query image, performs itsfeature extraction and can provide resources to make modi-fications on the query images or even integration of imagekeywords into the query [8]. Finally, the retrieval modulecomputes some measure of similarity between the query andthe database images. Then, the obtained quantities are sortedand images with the highest similarities are returned as targetones.

One important point in this dataflow is the incorporationof prior information through some human interaction withthe database. For example, in [9] the database is segmentedin classes which are manually defined. In this case, thequery module performs the query image classification; thatis, automatically label the query image according to the classthey belong. In this case, the retrieval module must search thetargets only among the images that belong to same class ofthe query one.

In the case of the CBIR approach proposed in [10], it isconsidered two SVM ensembles based on the ”one-against-all” SVM multi-class approach [11]: one for dimensionalityreduction and another one for classification as a part of thefeature and query modules, respectively. The feature moduleengine takes an input RGB image, resize it to 128 × 128resolution and performs a suitable color space transformation.Then, it applies Daubechies’ wavelet transform and constructsa feature vector from the obtained low-pass image components.Next, the first SVM ensemble computes a reduced featurevector, with dimension equal to the number or pre-definedclasses, which will represent the input image in the furtheroperations. Once a query image is presented, the second SVMensemble performs its classification. Finally, the Euclideandistances from the query image to its class images are used asa similarity measure for image retrieval.

In this paper we also consider a multiple SVM ensemblefor CBIR. In the feature extraction step we get a compactrepresentation of the image by using the Discrete CosineTransform (DCT) of the image instead of the Daubechieswavelet applied in [10]. The DCT implementation is simplerthan the Daubechies wavelet and we have obtained suitableresults with DCT. Then, like in [10], we construct N SVMmodels, one for each class of the database.

Given the query image, this SVM ensemble is used to find

the candidate classes for the query classification. Specifically,each SVM ”i” returns a real number that is interpreted as theprobability of the query belongs to the corresponding classCi. So, we select only the classes whose probability is largerthan the mean one. Next, a new SVM ensemble is constructedwith the selected classes, using the same strategy as before,and applied to improve the target search. The process stopswhen only one class is returned which completes the queryclassification stage. This class will be used in the final step forimage similarity computation and retrieval. Before constructingthe SVM ensembles we pre-process the images with theDiscrete Cosine Transform (DCT) for feature extraction. Themethod is ”iterative” in the sense that in each instance of themain loop we take the result of the previous one in order torefine the classification of the query. We present the accuracyof our approach for Corel and Ground Truth image databasesand show that its average accuracy outperforms the reference[10].

The paper is organized as follows. In section II we reviewthe basic elements of image processing and SVM model. InSection III discusses the details of our proposal for CBIR. Theexperimental results are presented on section IV. Finally, weoffer conclusions and perspectives for this work in section V.

II. BACKGROUND

In this presentation a true-color RGB image I is repre-sented by a (generalized) matrix I ∈

Specifically, for training each SVM model i, we take all kimages from class i and label them as 1. Then, using randomsampling we choose (2k)/(N − 1) images from classes otherthan i and label them as −1 [10]. The obtained set of featurevectors and corresponding labels lm:

S = {(l1, x1) , (l2, x2) , ...} , xm ∈ y. Let us supposethat there are L classes that follow this condition. Then, weapply the algorithm (1), with N ← L and the image classesset updated by Φ = {C1, C2, · · ·, CL}, to construct other Lsupport vector machines. Then, we feed each new SVM modelwith the query image z and verify the classes such that yi > yin order to get another subset of candidate classes. We repeatthis process until we have only one class C that satisfies thecondition yi > y. The algorithm 2 summarizes the wholeprocess.

Then, like in [10], it is calculated and sorted the Euclidiandistances between the query image and all the images thatbelong to the same class. Images with the lowest Euclideandistances are considered similar images and returned by thesystem. This complete the image retrieval step of our method.

Algorithm 2 Query image classificationInput: Image Classes Set: Φ = {C1, C2, · · ·, CN}; SVMparameters; Query Image: z.Set L← N .while L > 1. do

Apply Algorithm (1) to generate SVM models:SVM1, SV M2, · · ·, SV ML.Compute y1, y2, · · ·, yL in expression (6).Calculate y using equation (7).Select the classes C1, C2, · · ·, CK such that yCj > y.Update image classes set: Φ← {C1, C2, · · ·, CK}.Set L← K.

end whileOutput: Class C of the query image.

IV. EXPERIMENTAL RESULTS

In this section we demonstrate the potential of our proposalby using the Corel [16] and Ground Truth [17] RGB imagedatabases. The former is composed by 1000 images with 10different categories while the latter is composed by 1109divided in 22 genders. We choose these databases in orderto produce a straightforward comparison between our resultsand the ones presented in [10]. The figures 1, 2 and 3 showsamples of [16] database and the figures 4 and 5 show samplesof the [17] database.

Once our methodology is a supervised one, we needsome human interaction to perform a pre-classification ofthe images and segment the database in classes. This stephas the advantage of incorporating prior information to thesystem since human are experts in visual pattern recognition.Accordingly, our image database classification is likewise in[10], in other words, we select all images from [16] and divideinto 10 classes and we selected 228 images from [17] anddivided into 5 classes. named according to Tables I and II.

The feature extraction step takes an input RGB image andresize it to 128 × 128 resolution in order to normalize theinput data. Then, we compute the DCT transform for eachdatabase image and perform a zonal mask operation, given byexpression (1). So, we need to set the parameter R for the zonalmask. The idea is to choose the R value that preserves thehighest DCT coefficients in each channel. We experimentallyfind that R = 30 is a suitable value for Corel and Groundtruth databases. Once the resized images have 128×128 pixelsthis value represents 23% of the resized image resolution. Thesensitivity of the approach respect to R is discussed later.

So, let us start with the Corel database. Following theAlgorithm 2 we first construct 10 SVM models by callingthe Algorithm 1. In the actual implementation each SVM is alinear machine defined by expression 2. The next operationsof the main loop in the algorithm 2 are simple and did notdepend on any other parameter.

Following [10], we first randomly divide our database intotwo parts: the first one with 90% of the images to be used fortraining and the second one composed by 10% of the imagesfor query tests. The Table I shows the accuracy results forCorel database.

If compared with the analogous results presented in [10],

and reproduced in the third line of Table I, we observe that ourmethod performs worst only for the Buses class (40% against80%). The average accuracy of our method is 68% against62% of the reference [10] which also points a superiority forour method in this item.

For the Ground Truth database we use the same methodol-ogy as before and the results are presented in Table II and showthat the accuracy of the proposed method always outperformsthe reference [10].

TABLE I. CLASSIFICATION ACCURACY FOR THE COREL DATABASE.

Categories Accuracy of Accuracy ofour proposed (%) [10] (%)

African People and villages 60 50Horses 100 80Food 20 20

Buildings 30 20Dinosaurs 100 90Elephants 80 60Flowers 100 100

Montains 80Buses 40 80Beach 70 70

Average 68 62

TABLE II. CLASSIFICATION ACCURACY FOR GROUND TRUTH THEDATABASE.

Categories Accuracy of Accuracy ofour proposed (%) [10] (%)

Arborgreens 80 66.67Cherries 80 50Football 100 75

Greenlake 80 50Swissmountains 66,67 50

Average 81,33 59,09

Now, let us consider the image retrieval result. We mustremember that the image retrieval step searches the targets onlyamong the images that belong to same class of the query one.The Figures 1.(a), 2.(a), 3, 4.(a) and 5.(a) are used as query totest the efficiency of this stage. We only show four matchessince they are enough to reported and discuss our results.The Figure 1.(a) shows horses with a background composedby nature elements as query, taked from Corel database. TheFigure 1.(b)-(e) show the four images that the system returnsas the most similar to the query. They are sorted in decreasingorder of the similarity (1.(b) has the best similarity, 1.(c) thesecond one, etc.). A visual inspection shows that the resultsare very similar to the query image.

The second example, taken from Corel database, pictureselephants inside its natural habitat. As before, the Figure 2.(a)shows the query and the other ones (Figures 2.(b)-(d))) picturethe most similar four images retrieved. In this case we alsoobserve a suitable result, despite of changes in the color patternand number of elephants in the scene. The quality of the resultfor the flower query in third example is not so evident. All thefour most similar images retrieved are flowers, but the colorpatterns and textures needs a more deeply analysis to quantifythe quality of the result.

The next example shows a stadium with football game,pictured on Figure 4. A visual inspection shows that the

(a)

(b) (c)

(d) (e)

Fig. 1. Images from Corel database: (a) Query image. (b)-(e) Four mostsimilar images returned.

(a)

(b) (c)

(d) (e)

Fig. 2. Images from Corel database: (a) Query image. (b)-(e) Four mostsimilar images returned.

(a)

(b) (c)

(d) (e)

Fig. 3. Images from Corel database: (a) Query image. (b)-(e) Four imageswith highest similarities.

Figures 4.(b) and 4.(e) are the most similar ones. However,the system gives the Figures 4.(c)-(d) with similarity betterthen the Figure 4.(e). This problem may happens because weare considering only the image field to compute the similarity.Probably, this result may be improved by incorporating high-level semantic features in the image retrieval stage [18].Finally, in Figure 5.(a), we take a landscape with some ducksin a lake as a query, from the Ground Truth database. In thiscase, the first three retrieved images are very suitable. Thelast image has the basic elements of the query one except theducks. Again, a qualitative evaluation of this result dependson semantic considerations that are far from the scope of thiswork in the actual stage.

The algorithm has been implemented using JAVA as devel-opment language and executed in an Intel Core I7 computerwith CPU Clock of 1.90 GHz running Windows as operatingsystem.

When considering the CPU time of the algorithm executionwe must be careful about the following aspects. In the appli-cation of SVM machines we must distinguish two phases: thetraining and the classification ones. The training is performedoff-line in order to construct the machine.

However, the Algorithm 2 executes the training of LSVMs in each iteration. Therefore the computational time of aquery execution includes the construction of the SVM models,performed in the Algorithm 1 which much increases thecomputational cost.For instance, the CPU time for the queryclassification and retrieval of similar images for the results ofTable I falls in the range [20sec, 61sec] and [8sec, 13sec] ofTable II.

Obviously we can decrease the computational time in for

(a)

(b) (c)

(d) (e)

Fig. 4. Images from Ground Truth database: (a) Query image. (b)-(e) Fourmost similar images returned.

(a)

(b) (c)

(d) (e)

Fig. 5. Images from Ground Truth database: (a) Query image. (b)-(e) Mostsimilar images that shares the basic elements of the query.

query classification if we train all the possible SVM modelsfor a given N . That means; we must construct 2N machinesto get all the necessary elements for the query module.

For instance, for N = 10 we need to construct 1024 SVMmachines. In the case of Corel database our implementationneeds around 10sec to train each SVM machine used togenerate the results of I. Therefore, have a total CPU timeof order: 1024× 10sec = 10240 ≈ 3.84hours to train all thenecessary machines. We shall remember that the SVM trainingwill be performed only once in order to construct the poll ofmachines that will be called during the query phase. Besidesthe training of each machine is independent from the otherones. Therefore, this time can be easily decreased by usinghigh performance resources. However, it is obvious that ourmethodology is not suitable for large values of N due to theexponential increasing of the computational time.

Next, we must discuss the sensitivity of the methodologyaccuracy when changing the R value. Table III presents theavarage accuracy for 4 values of this parameter. At first, weobserve that the behaviour of the accuracy is not the same forthe two databases. In fact, the results indicate that the Coreldatabase accuracy has a stability in the for R ∈ [30, 64]. ForR = 16 we observe a considerable decreasing in the accuracyas expected. However, for the Ground truth database thebehavior seems to be inconsistent due the fact that the accuracydecreases when the mask is enlarged from R = 40 to R = 64.We must be careful in this case, because when increasingthe number of DCT components we are also increasing theredundancy in the image representation as well as been moresubject to overfitting problems [19]. We need a more deeplyanalysis to confirm this fact. On the other hand, the accuracyremains unchanged for R = 40 and R = 30, and it decreasesfor R = 16, keeping the same behavior for the other databasefor these values.

TABLE III. CLASSIFICATION ACCURACY VARYING R.

R Corel database Ground truth databaseAccuracy (%) Accuracy (%)

64 68 7440 68 8130 68 8116 49 70

V. CONCLUSION AND PERSPECTIVES

This paper proposes an iterative method for CBIR. Themethod receives a database previously divided into N classesand apply DCT for feature extraction. Then it constructs NSVM machines and performs a selection of the candidateclasses for query classification. In the next steps (iterations ofthe main loop in the Algorithm 2) the target search is improveduntil the query image class is returned. The results show that itis a promising method. The obtained accuracy rates in generaloutperforms the ones reported in [10].

For further works we plan to develop a high performancemodel capable to show the same results presented in this paperwith more efficiency. Furthermore we would like to comparethe accuracy of our approach using diferent methods for featureextraction like Wavelet, Fourier, SIFT and others, since, theproposed methodology is not limited to a particular extraction

method. Also, we should investigate the accuracy if a distortedbut perceptually similar image was presented to the algorithm.

ACKNOWLEDGMENT

The authors would like to thank the PCI-LNCC for theirfinancial support.

REFERENCES[1] R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image retrieval: Ideas,

influences, and trends of the new age,” ACM Comput. Surv., vol. 40,no. 2, May 2008.

[2] S.R.Surya and G.Sasikala, “Survey on content based image retrieval,”Indian Journal of Comp. Science and Engin., vol. 2, no. 5, Oct-Nov2011.

[3] A. K. Jain, Fundamentals of Digital Image Processing. Prentice-Hall,Inc., 1989.

[4] G. Gagaudakis and P. L. Rosin, “Incorporating Shape into Histogramsfor CBIR,” Pattern Recognition, vol. 35, pp. 81–91, 2002.

[5] N. Roma and L. Sousa, “A tutorial overview on the properties of thediscrete cosine transform for encoded image and video processing,”Signal Processing, vol. 91, no. 11, pp. 2443–2464, 2011.

[6] S. Mallat, A Wavelet Tour of Signal Processing, Third Edition: TheSparse Way, 3rd ed. Academic Press, 2008.

[7] T. Li, Q. Li, S. Zhu, and M. Ogihara, “A survey on wavelet applicationsin data mining,” SIGKDD Explor. Newsl., vol. 4, no. 2, pp. 49–68, Dec.2002.

[8] J. R. Bach, C. Fuller, A. Gupta, A. Hampapur, B. Horowitz,R. Humphrey, R. C. Jain, and C. F. Shu, “Virage image search engine:an open framework for image management,” I. K. Sethi and R. C. Jain,Eds., vol. 2670, no. 1. SPIE, 1996, pp. 76–87.

[9] J. Z. Wang, J. Li, and G. Wiederhold, “Simplicity: Semantics-sensitiveintegrated matching for picture libraries,” in Proceedings of the 4thInternational Conference on Advances in Visual Information Systems,ser. VISUAL ’00. London, UK: Springer-Verlag, 2000, pp. 360–371.

[10] E. Yildizer, A. M. Balci, M. Hassan, and R. Alhajj, “Efficient content-based image retrieval using multiple support vector machines ensem-ble,” Expert Syst. Appl., vol. 39, pp. 2385–2396, Feb. 2012.

[11] A. Gidudu, G. Hulley, and T. Marwala, “Image classification usingsvms: One-against-one vs one-against-all,” CoRR, vol. abs/0711.2914,2007.

[12] V. N. Vapnik, Statistical Learning Theory. John Wiley & Sons, INC.,1998.

[13] C. J. C. Burges, “A tutorial on support vector machines for patternrecognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp.121–167, 1998.

[14] K. Fukunaga, “Introduction to statistical patterns recognition,” Aca-demic Press, New York., vol. 18(8), pp. 831–836, 1990.

[15] B. Zadrozny and C. Elkan, “Transforming classifier scores into accuratemulticlass probability estimates,” in In Proceedings of the Eighth ACMSIGKDD International Conference on Knowledge Discovery and DataMining (KDD-2002. ACM Press, 2002, pp. 694–699.

[16] “Penn state university’s web page for modeling objects, concepts,and aesthetics in images project http://wang.ist.psu.edu/docs/relatedaccessed 31.03.2013,” 2013.

[17] “University of washington’s web page for object andconcept recognition for contentbased image retrieval projecthttp://www.cs.washington.edu/research/imagedatabase accessed31.03.2013,” 2013.

[18] Y. Liu, D. Zhang, G. Lu, and W.-Y. Ma, “A survey of content-basedimage retrieval with high-level semantics,” Pattern Recogn., vol. 40,no. 1, pp. 262–282, Jan. 2007.

[19] B. Scholkopf and A. J. Smola, Learning with Kernels: Support VectorMachines, Regularization, Optimization, and Beyond. Cambridge, MA,USA: MIT Press, 2001.