Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005
Using Knowledge of the Region of Interest (ROI)in Automatic Image Retrieval Learning
Paisarn MuneesawangCollege of Information TechnologyUnited Arab Emirates University
Al-Ain, [email protected]
Abstract-In this paper, we propose an automaticrelevance feedback retrieval system using perceptuallyimportant features extracted from regions of interest. Thesystem is implemented via self-learning using a self-organizing tree map (SOTM) neural network. Our proposedmethod involves the construction of regions of interest fromretrieved images using Edge Flow model, and the groupingof the regions into a single perceptually significant entity.This knowledge is fed into a set of unsupervised relevancefeedback learning modules based on the SOTM to guide theadaptation of relevance feedback parameters through amachine learning approach without user interaction.Optimal tradeoff between the user workload in theinteractive process and user subjectivity is then be exploredby incorporating a semi-automatic retrieval strategy.Experimental results indicate that this system, withautomatic and semiautomatic adaptations, can minimizeuser interaction, optimize precision, as well as reduceperformance errors caused user subjectivity.
I. INTRODUCTION
In image and multimedia retrieval, relevance feedbackparameters are usually derived from positive and negativesamples provided by the users [1][2]. Although havingbeen demonstrated to be effective in applications to digitallibraries of small size, this user interaction process hasbeen recognized as a time consuming task as it requiresheavy workload upon the users in providing a series ofexamples gathering through relevance feedback cyclesand as potentially error-prone when dealing withcommercial digital libraries, the sizes of which arenormally very large.He and King [3] proposed shot-term and long-term
frameworks to memorize user feedback to improveretrieval performance with knowledge obtained fromprevious queries. This reduces subjectivity and noise fromindividual users. Wu and Huang [4] employed bothlabeled and unlabeled samples to increase relevancefeedback performance within the transductive learning
Ling GuanRyerson Multimedia Research Laboratory,
Ryerson University, Toronto, ON,Canada, M5B 2K3Iguan(ee.ryerson.ca
framework. Although exploring unlabeled samples, thiswork is still based on the user supervision method.To solve the classical problem in content-based image
retrieval using relevance feedback, an automatic relevancefeedback image retrieval system based on the applicationof a self-organizing tree map (SOTM) and nonlinearradial-basis function (RBF) model was introduced whichoffered high retrieval accuracy when applying to thepopular Bordatz Database using texture descriptors [5].The idea was further extended to deal with a larger set ofimages selected from the Corel image collection usingcolor, texture and shape features [6] and to searchdistributed image libraries over the Internet [7]. Thegeneral principle of automatic relevance feedback is toapply unsupervised learning techniques to relevanceclassification in order to minimize the number of userfeedback cycles required in modeling user's queries. Butthe importance of the perceptually inspired features in therelevance classification process has not been properlystudied.
It is well acknowledged that to obtain maximumprecision rate in image retrieval, it is critical that thelearning process should effectively exploit the knowledgeof image relevancy. This requires modeling imagecontents with sufficiently accurate features for thecharacterization ofperceptual importance.
This issue is especially pressing with automaticrelevance feedback since, without providing some form ofknowledge to the relevance classification process from theexternal world, the SOTM classifier cannot operate asefficiently as that of a user supervision process. Forexample, global features of shape, color, or textureinformation might consume an undue proportion ofweights toward the judgment of image relevancy bymachine vision. Furthermore, these global features do notalways address perceptually important regions or anysalient objects depicted in an image. This is because thereare more regions in an image than those which are of
0-7803-9048-2/05J$20.00 02005 IEEE 1 854Authorized licensed use limited to: NARESUAN UNIVERSITY. Downloaded on January 4, 2010 at 23:59 from IEEE Xplore. Restrictions apply.
perceptually importance. So, higher classificationaccuracy may be possible with the acquisition of moreprecise perception information. However, the form ofknowledge needed in automatic relevance feedback has tobe identified before the retrieval process begins, instead ofduring the process as the user-interactive relevancefeedback does.
In this paper we propose an automatic adaptive imageretrieval scheme with embedded knowledge of perceptualimportance, the form of which is identified in advance.With a specific domain to photograph collection, wepursue the restricted goal of identifying the region ofinterest (ROI). The ROI assumes that the significantobjects within an image are often located at the center, asa photographer usually tries to locate significant objects atthe focus of the camera's view. We adopt the Edge Flowmodel [8] to identify the ROI within a photograph. ThisROI does not necessarily require the exact identificationof a possible object in the image, but only the regionselected which adequately reflects those properties of theobject such as color or shape which are usually used asfeatures for matching in retrieval.The proposed retrieval system incorporates the
knowledge of the ROI into the self-learning relevancefeedback based on the SOTM for automatic imageretrieval. The motivation of automatic relevance feedbackis not to replace user interaction, but to enhance the user-centered system by minimizing user workload withminimum number of feedback cycles. Furthermore, tobalance between the difficulties encountered incharacterizing user subjectivity and user workload, thissystem also extended to implement a semi-automaticlearning strategy in which user supervision can becombined with automatic adaptation in such as way thatthe system learns to acquire different user subjectivities inorder to achieve optimum performance.The rest of the paper is organized into the following
sections. The characterization of the ROI is described inSection 2. In Section 3 we present the implementation of aself-adaptive retrieval system using the SOTM and theRBF model. Section 4 shows the experimental resultsobtained by applying the retrieval methods to the Corelphotograph database. Some conclusions are drawn inSection 5.
TI. CHARACTERIZATION OF THE ROIImage segmentation is considered as a crucial step in
performing high-level computer vision tasks such asobject recognition and scene interpretation [9]. Sincenatural scenes within an image could be too complex to becharacterized by a single image attribute, it is moreappropriate to consider a segmentation method that is ableto address the representation and integration of different
attributes such as color, texture, and shape. We proposeadopting the Edge Flow model demonstrated in [7], whichproved to be effective in image boundary detection andapplication to video coding [10]. The Edge Flow modelimplements a predictive coding scheme to identify thedirection of change in color, texture, and filtered phasediscontinuities.
A. Region characterization by "Edge Flow"
An edge flow vector at pixel location s is a vector sumof edge energies given by:
F = ; E(s,0) exp(]O)0(s)
knowledge, we can effectively attain ROI by associating itwith the objects that locate at the center of photographs.Let S ={R,i = ,2,..., N R n Rb = 0 be a set of regionsgenerated by the Edge Flow model from one image. Also,let W be a set of labels of regions that locate either partlyor completely inside a predefined m-by-n pixel window,where the center of the window is located at the center ofthe image. We define ROI as a collection of regions whichare the members of W:
St= UR,i#2W
(5)In practice, however, it is important to remove from S'
the background or other regions that are located mainlyoutside the defined m-by-n window. In this case,A region R,, iE W, is not included in S' if its area
outside the window is greater than its area inside thewindow. Thus, a complete region of interest is defined as:
ROI := S'nfcomp(R1,.)where comp is the complement set operation. In thisdefinition, each R.. is defined so that:
R,. =R, ifM X e 9Vp"be the set of samples retrieved at the first round ofretrieval, subjected to:
(8)where D7z) denotes the distance between sampleX, andthe query Z . Here, the query Z and the samples X, areeach represented by a point in the feature space 9i P andconsist ofp, features, including color, shape, and texture(to be discussed in Section 4). The authors explained in[5] that in order for the unsupervised learning processbrought into automatic relevance feedback to be effective,a different and more powerful feature space other than9V' should be introduced in relevance classification. Wecall this feature spacesP2 Apparently features in
1856
(6)
D(z) < D(z)
9P2 should be of higher quality than those in 91Pi inorder for the relevance identification process to beeffective. Because of the limited number of images beinganalyzed (only those identified by the retrieval process as"similar" to the query), we can afford to employcomputationally more intensive but also more effectiveimage processing/analysis methods in order to obtainfeatures which are perceptually important in relevanceclassification. Apparently features extracted from theROIs satisfy, this requirement as they provide theembedded knowledge of ROI in assisting relevanceclassification. Therefore, we extract a different set of
features, {X, }l, Xi E 9 P2, to characterize the retrievedimages. We again choose color and shape as the features,but calculate them only from the ROTs in the retrievedimages after applying ROI identification. The retrievalprocedure is summarized as follows:
LO At the first step, let Wj,j 1,2,...,Jbe the node vectors of SOTM.While Wj(t+l).Wj(t),Vj, randomly selectiX and compute:
j* = argmin(llX Wj 11) ,Jif 11IXV-Wj.(t) II< H(t), compute:
Wj..(t+)= W.. (t) + a(t)(X - W.(t)
else, set W1+, =X, where H(t) is thehierarchy control function, and a(t)is a learning rate. H(t) and a(t) aredecreased with iteration t.
L At the second step, from {Wi}¾,=find:
n =1{j1*1i2 } Sst. 112Z -W. 11'112Z-Wj. 1lwhere j*=argmin(jIZ-WjII), and Z is thequery position.
L At the third step, for l
using color histogram and Fourier descriptor. Here, wealso pay attention to the application of a weightingscheme to the color and shape descriptors which are theinput to the SOTM, to embed user's perception in theSOTM relevance identification process.We compare the performances of four methods: non-
adaptive CBIR, user-controlled RF, automatic RF, andsemi-automatic RF, using 20 queries from differentcategories. The non-adaptive CBIR method employnormalized Euclidean distance as matching criterion.This method provides a retrieved image set to the user-controlled RF algorithm that further enhances the systemperformance by the non-linear RBF model, together withuser interaction interface. In comparison, in automatic RFcase, the relevance identification was executed by theSOTM with two cycles of adaptation. In addition, afterautomatic adaptation, the system performance was refinedby a user to obtain semi-automatic RF results.
Table I presents results obtained by the four methods,measured by the average precisions on the top 16 bestmatches. Evidently, the automatic RF providesconsiderable improvement over the non-adaptive CBIRmethod (i.e., by more than 25% in precision measure),without user interaction. The automatic result is =4%lower than that of user-controlled RF method. Bycombining automatic learning with user interaction, it isobserved that the semi-automatic RF clearly outperformsothers methods discussed.We also experimented on allowing the user interaction
process to continue until convergence. It is observed thatthe user-controlled RF and the semi-automatic RF reachedthe convergence at the similar points at = 93% inprecision measure. However, in order to reach thisoptimum point, the user-controlled RF method used onaverage 2.4 cycles of user interaction, while the semi-automatic RF method used 1.6 cycles. This shows that thesemi-automatic method is the most effective learningstrategy in both retrieval accuracy and minimizing userinteraction. This also demonstrates that the application ofself-organizing RF in combination with perceptuallysignificant features extracted from the ROIs clearlyenhanced the overall system performance.One retrieval example is given in Figure 3. Figure 3(a)
shows the 16 best-matched images obtained by the non-adaptive CBIR method, with the query image displayed atthe top left corner. From the figure, nine images arerelevant to the query "plane" shown in the top-left corner.By comparison, Figure 3(b) shows the results afterapplying self-organizing relevance feedback to capturenotion of image similarity with the ROI features, whichshows the considerably improved retrieval results by usingthe proposed method.
(a)
(b)Figure 3: Examples ofretrieval results on the query "plane", obtained by (a)non-adaptive CBIR method, and (b) automatic relevance feedback method.The precisions measured from (a) and (b) are: 0.56 and 0.94, respectively.
TABLE IAVERAGE PRECISION RATE (%/6) AND NUMBER OF USER
FEEDBACK CYCLES, OBTAINED BY RETRIEVING 20 QUERIESFROM COREL DATABASE, MEASURED FROM THE TOP 16 BESTMATCHES. PRECISION IS DEFINED AS np / 16, WHERE np IS
THE NUMBER OF RELEVANT IMAGES RETRIEVED.
Average # UserMethod Precision (%) FeedbackNon-adaptive CBIR 52.81 -Automatic RF 78.13 -User-controlled RF 82.19 1Semi-automatic RF 87.50 I
1858Authorized licensed use limited to: NARESUAN UNIVERSITY. Downloaded on January 4, 2010 at 23:59 from IEEE Xplore. Restrictions apply.
V. CONCLUSIONS
In the context of a relevance feedback retrievalframework, the user workload and processing timeintroduced by user-interaction are relatively high due to aseries of querying process. With the proposed automaticrelevant feedback strategy with perceptually importantfeatures extracted from the regions of interest, userinteraction can be eliminated from relevant feedbackimage retrieval. Simulation results demonstrate that theunsupervised relevance feedback learning strategy basedon the self-organizing tree map effectively utilized theperceptually important features extracted from the ROIsin relevance classification and substantially improvedretrieval accuracy without user interaction. In addition, itis observed that it is possible to find the optimal tradeoffbetween the automatic process and the number offeedback cycles required on the user in order to achievevery high retrieval accuracy.
ACKNOWLEDGMENT
This work is partially supported by the CanadaResearch Chair Program and by Canada Foundation forInnovations. The first author would like to thankNaresuan University for generous support.
REFERENCES
[1] M. Naphades, R. R. Wang, T. S. Huang, "Audio-visual query and retrieval: system that uses dynamicprogramming and relevance feedback", Journal ofElectronic Imaging, pp. 861-870, Oct., 2001.
[2] Z. Stejic, Y. Takama, K. Hirota, "Relevancefeedback-based image retrieval interfaceincorporating region and feature saliency patterns asvisualizable image similarity criteria," IEEETransactions on Industrial Electronics, vol. 50 , issue5, pp. 839-852, Oct. 2003.
[3] Y. Wu, Q. Tian, and T. S. Huang, "Discriminant-EMalgorithm with application to image retrieval," inProc. IEEE Conf: Computer Vision and PatternRecognition, Hilton Head Island, SC, June 2000.
[4] X. He, 0. King, W.-Y. Ma, M. Li, and H.-J. Zhang,"Learning a Semantic Space From User's RelevanceFeedback for Image Retrieval," IEEE Transactions oncircuits and systemsfor video technology, vol. 13, no.1, pp. 39-48, January 2003.
[5] P. Muneesawang and L. Guan, "Automatic machineinteractions for content-based image retrieval using aself-organizing tree map architecture," IEEE Trans.on Neural Networks, vol. 13, no. 4, pp. 821-834, July2002.
[6] K. Jarrah, P. Muneesawang, I. Lee and L. Guan,"Application of image visual characterization andsoft feature selection in automatic image retrieval,"submitted to IEEE Int. Conf on Multimedia andExpo, Amsterdam, The Netherlands, July 2005.
[7] I. Lee and L. Guan, "Content-based image retrievalwith automated relevance feedback over distributedpeer-to-peer network," Proc. IEEE Int. Symposiumon Circuits and Systems, pp. 11.5-8, Vancouver,Canada, May 2004.
[8] W. Y. Ma and B. S. Manjunath, "Edge flow: aframework for boundary detection and imagesegmentation," Proc. IEEE International Conferenceon Computer Vision and Pattern Recognition, pp.744-749, Puerto Rico, 1997.
[9] H.-S. Wong and L. Guan, "Characterization forperceptual importance for object-based imagesegmentation," Proc. IEEE Int. Conf: on ImageProcessing, pp. 54-57, Vancouver, 2000.
[10]D. Mukherjee, Y. Deng and S.K. Mitra, "A region-based video coder using edge flow segmentation andhierarchical affine region matching, Proc. of SPIE,vol. 3309, 1998.
[11] Corel Gallery Magic 65000, www.corel.com, 1999.[12] P. Muneesawang and L. Guan, "Image retrieval with
embedded sub-class information using Gaussianmixture models" Proc. IEEE Int. Conf: onMultimedia and Expo, Maryland, USA, pp. 769-772,vol. 1, July 2003.
1859
Authorized licensed use limited to: NARESUAN UNIVERSITY. Downloaded on January 4, 2010 at 23:59 from IEEE Xplore. Restrictions apply.