6
Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005 Using Knowledge of the Region of Interest (ROI) in Automatic Image Retrieval Learning Paisarn Muneesawang College of Information Technology United Arab Emirates University Al-Ain, UAE [email protected] Abstract-In this paper, we propose an automatic relevance feedback retrieval system using perceptually important features extracted from regions of interest. The system is implemented via self-learning using a self- organizing tree map (SOTM) neural network. Our proposed method involves the construction of regions of interest from retrieved images using Edge Flow model, and the grouping of the regions into a single perceptually significant entity. This knowledge is fed into a set of unsupervised relevance feedback learning modules based on the SOTM to guide the adaptation of relevance feedback parameters through a machine learning approach without user interaction. Optimal tradeoff between the user workload in the interactive process and user subjectivity is then be explored by incorporating a semi-automatic retrieval strategy. Experimental results indicate that this system, with automatic and semiautomatic adaptations, can minimize user interaction, optimize precision, as well as reduce performance errors caused user subjectivity. I. INTRODUCTION In image and multimedia retrieval, relevance feedback parameters are usually derived from positive and negative samples provided by the users [1][2]. Although having been demonstrated to be effective in applications to digital libraries of small size, this user interaction process has been recognized as a time consuming task as it requires heavy workload upon the users in providing a series of examples gathering through relevance feedback cycles and as potentially error-prone when dealing with commercial digital libraries, the sizes of which are normally very large. He and King [3] proposed shot-term and long-term frameworks to memorize user feedback to improve retrieval performance with knowledge obtained from previous queries. This reduces subjectivity and noise from individual users. Wu and Huang [4] employed both labeled and unlabeled samples to increase relevance feedback performance within the transductive learning Ling Guan Ryerson Multimedia Research Laboratory, Ryerson University, Toronto, ON, Canada, M5B 2K3 Iguan(ee.ryerson.ca framework. Although exploring unlabeled samples, this work is still based on the user supervision method. To solve the classical problem in content-based image retrieval using relevance feedback, an automatic relevance feedback image retrieval system based on the application of a self-organizing tree map (SOTM) and nonlinear radial-basis function (RBF) model was introduced which offered high retrieval accuracy when applying to the popular Bordatz Database using texture descriptors [5]. The idea was further extended to deal with a larger set of images selected from the Corel image collection using color, texture and shape features [6] and to search distributed image libraries over the Internet [7]. The general principle of automatic relevance feedback is to apply unsupervised learning techniques to relevance classification in order to minimize the number of user feedback cycles required in modeling user's queries. But the importance of the perceptually inspired features in the relevance classification process has not been properly studied. It is well acknowledged that to obtain maximum precision rate in image retrieval, it is critical that the learning process should effectively exploit the knowledge of image relevancy. This requires modeling image contents with sufficiently accurate features for the characterization of perceptual importance. This issue is especially pressing with automatic relevance feedback since, without providing some form of knowledge to the relevance classification process from the external world, the SOTM classifier cannot operate as efficiently as that of a user supervision process. For example, global features of shape, color, or texture information might consume an undue proportion of weights toward the judgment of image relevancy by machine vision. Furthermore, these global features do not always address perceptually important regions or any salient objects depicted in an image. This is because there are more regions in an image than those which are of 0-7803-9048-2/05J$20.00 02005 IEEE 1 854 Authorized licensed use limited to: NARESUAN UNIVERSITY. Downloaded on January 4, 2010 at 23:59 from IEEE Xplore. Restrictions apply.

Using of the Region of Interest (ROI) in Automatic Image ... · images selected from the Corel image collection using color, texture and shape features [6] and to search distributed

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005

    Using Knowledge of the Region of Interest (ROI)in Automatic Image Retrieval Learning

    Paisarn MuneesawangCollege of Information TechnologyUnited Arab Emirates University

    Al-Ain, [email protected]

    Abstract-In this paper, we propose an automaticrelevance feedback retrieval system using perceptuallyimportant features extracted from regions of interest. Thesystem is implemented via self-learning using a self-organizing tree map (SOTM) neural network. Our proposedmethod involves the construction of regions of interest fromretrieved images using Edge Flow model, and the groupingof the regions into a single perceptually significant entity.This knowledge is fed into a set of unsupervised relevancefeedback learning modules based on the SOTM to guide theadaptation of relevance feedback parameters through amachine learning approach without user interaction.Optimal tradeoff between the user workload in theinteractive process and user subjectivity is then be exploredby incorporating a semi-automatic retrieval strategy.Experimental results indicate that this system, withautomatic and semiautomatic adaptations, can minimizeuser interaction, optimize precision, as well as reduceperformance errors caused user subjectivity.

    I. INTRODUCTION

    In image and multimedia retrieval, relevance feedbackparameters are usually derived from positive and negativesamples provided by the users [1][2]. Although havingbeen demonstrated to be effective in applications to digitallibraries of small size, this user interaction process hasbeen recognized as a time consuming task as it requiresheavy workload upon the users in providing a series ofexamples gathering through relevance feedback cyclesand as potentially error-prone when dealing withcommercial digital libraries, the sizes of which arenormally very large.He and King [3] proposed shot-term and long-term

    frameworks to memorize user feedback to improveretrieval performance with knowledge obtained fromprevious queries. This reduces subjectivity and noise fromindividual users. Wu and Huang [4] employed bothlabeled and unlabeled samples to increase relevancefeedback performance within the transductive learning

    Ling GuanRyerson Multimedia Research Laboratory,

    Ryerson University, Toronto, ON,Canada, M5B 2K3Iguan(ee.ryerson.ca

    framework. Although exploring unlabeled samples, thiswork is still based on the user supervision method.To solve the classical problem in content-based image

    retrieval using relevance feedback, an automatic relevancefeedback image retrieval system based on the applicationof a self-organizing tree map (SOTM) and nonlinearradial-basis function (RBF) model was introduced whichoffered high retrieval accuracy when applying to thepopular Bordatz Database using texture descriptors [5].The idea was further extended to deal with a larger set ofimages selected from the Corel image collection usingcolor, texture and shape features [6] and to searchdistributed image libraries over the Internet [7]. Thegeneral principle of automatic relevance feedback is toapply unsupervised learning techniques to relevanceclassification in order to minimize the number of userfeedback cycles required in modeling user's queries. Butthe importance of the perceptually inspired features in therelevance classification process has not been properlystudied.

    It is well acknowledged that to obtain maximumprecision rate in image retrieval, it is critical that thelearning process should effectively exploit the knowledgeof image relevancy. This requires modeling imagecontents with sufficiently accurate features for thecharacterization ofperceptual importance.

    This issue is especially pressing with automaticrelevance feedback since, without providing some form ofknowledge to the relevance classification process from theexternal world, the SOTM classifier cannot operate asefficiently as that of a user supervision process. Forexample, global features of shape, color, or textureinformation might consume an undue proportion ofweights toward the judgment of image relevancy bymachine vision. Furthermore, these global features do notalways address perceptually important regions or anysalient objects depicted in an image. This is because thereare more regions in an image than those which are of

    0-7803-9048-2/05J$20.00 02005 IEEE 1 854Authorized licensed use limited to: NARESUAN UNIVERSITY. Downloaded on January 4, 2010 at 23:59 from IEEE Xplore. Restrictions apply.

  • perceptually importance. So, higher classificationaccuracy may be possible with the acquisition of moreprecise perception information. However, the form ofknowledge needed in automatic relevance feedback has tobe identified before the retrieval process begins, instead ofduring the process as the user-interactive relevancefeedback does.

    In this paper we propose an automatic adaptive imageretrieval scheme with embedded knowledge of perceptualimportance, the form of which is identified in advance.With a specific domain to photograph collection, wepursue the restricted goal of identifying the region ofinterest (ROI). The ROI assumes that the significantobjects within an image are often located at the center, asa photographer usually tries to locate significant objects atthe focus of the camera's view. We adopt the Edge Flowmodel [8] to identify the ROI within a photograph. ThisROI does not necessarily require the exact identificationof a possible object in the image, but only the regionselected which adequately reflects those properties of theobject such as color or shape which are usually used asfeatures for matching in retrieval.The proposed retrieval system incorporates the

    knowledge of the ROI into the self-learning relevancefeedback based on the SOTM for automatic imageretrieval. The motivation of automatic relevance feedbackis not to replace user interaction, but to enhance the user-centered system by minimizing user workload withminimum number of feedback cycles. Furthermore, tobalance between the difficulties encountered incharacterizing user subjectivity and user workload, thissystem also extended to implement a semi-automaticlearning strategy in which user supervision can becombined with automatic adaptation in such as way thatthe system learns to acquire different user subjectivities inorder to achieve optimum performance.The rest of the paper is organized into the following

    sections. The characterization of the ROI is described inSection 2. In Section 3 we present the implementation of aself-adaptive retrieval system using the SOTM and theRBF model. Section 4 shows the experimental resultsobtained by applying the retrieval methods to the Corelphotograph database. Some conclusions are drawn inSection 5.

    TI. CHARACTERIZATION OF THE ROIImage segmentation is considered as a crucial step in

    performing high-level computer vision tasks such asobject recognition and scene interpretation [9]. Sincenatural scenes within an image could be too complex to becharacterized by a single image attribute, it is moreappropriate to consider a segmentation method that is ableto address the representation and integration of different

    attributes such as color, texture, and shape. We proposeadopting the Edge Flow model demonstrated in [7], whichproved to be effective in image boundary detection andapplication to video coding [10]. The Edge Flow modelimplements a predictive coding scheme to identify thedirection of change in color, texture, and filtered phasediscontinuities.

    A. Region characterization by "Edge Flow"

    An edge flow vector at pixel location s is a vector sumof edge energies given by:

    F = ; E(s,0) exp(]O)0(s)

  • knowledge, we can effectively attain ROI by associating itwith the objects that locate at the center of photographs.Let S ={R,i = ,2,..., N R n Rb = 0 be a set of regionsgenerated by the Edge Flow model from one image. Also,let W be a set of labels of regions that locate either partlyor completely inside a predefined m-by-n pixel window,where the center of the window is located at the center ofthe image. We define ROI as a collection of regions whichare the members of W:

    St= UR,i#2W

    (5)In practice, however, it is important to remove from S'

    the background or other regions that are located mainlyoutside the defined m-by-n window. In this case,A region R,, iE W, is not included in S' if its area

    outside the window is greater than its area inside thewindow. Thus, a complete region of interest is defined as:

    ROI := S'nfcomp(R1,.)where comp is the complement set operation. In thisdefinition, each R.. is defined so that:

    R,. =R, ifM X e 9Vp"be the set of samples retrieved at the first round ofretrieval, subjected to:

    (8)where D7z) denotes the distance between sampleX, andthe query Z . Here, the query Z and the samples X, areeach represented by a point in the feature space 9i P andconsist ofp, features, including color, shape, and texture(to be discussed in Section 4). The authors explained in[5] that in order for the unsupervised learning processbrought into automatic relevance feedback to be effective,a different and more powerful feature space other than9V' should be introduced in relevance classification. Wecall this feature spacesP2 Apparently features in

    1856

    (6)

    D(z) < D(z)

  • 9P2 should be of higher quality than those in 91Pi inorder for the relevance identification process to beeffective. Because of the limited number of images beinganalyzed (only those identified by the retrieval process as"similar" to the query), we can afford to employcomputationally more intensive but also more effectiveimage processing/analysis methods in order to obtainfeatures which are perceptually important in relevanceclassification. Apparently features extracted from theROIs satisfy, this requirement as they provide theembedded knowledge of ROI in assisting relevanceclassification. Therefore, we extract a different set of

    features, {X, }l, Xi E 9 P2, to characterize the retrievedimages. We again choose color and shape as the features,but calculate them only from the ROTs in the retrievedimages after applying ROI identification. The retrievalprocedure is summarized as follows:

    LO At the first step, let Wj,j 1,2,...,Jbe the node vectors of SOTM.While Wj(t+l).Wj(t),Vj, randomly selectiX and compute:

    j* = argmin(llX Wj 11) ,Jif 11IXV-Wj.(t) II< H(t), compute:

    Wj..(t+)= W.. (t) + a(t)(X - W.(t)

    else, set W1+, =X, where H(t) is thehierarchy control function, and a(t)is a learning rate. H(t) and a(t) aredecreased with iteration t.

    L At the second step, from {Wi}¾,=find:

    n =1{j1*1i2 } Sst. 112Z -W. 11'112Z-Wj. 1lwhere j*=argmin(jIZ-WjII), and Z is thequery position.

    L At the third step, for l

  • using color histogram and Fourier descriptor. Here, wealso pay attention to the application of a weightingscheme to the color and shape descriptors which are theinput to the SOTM, to embed user's perception in theSOTM relevance identification process.We compare the performances of four methods: non-

    adaptive CBIR, user-controlled RF, automatic RF, andsemi-automatic RF, using 20 queries from differentcategories. The non-adaptive CBIR method employnormalized Euclidean distance as matching criterion.This method provides a retrieved image set to the user-controlled RF algorithm that further enhances the systemperformance by the non-linear RBF model, together withuser interaction interface. In comparison, in automatic RFcase, the relevance identification was executed by theSOTM with two cycles of adaptation. In addition, afterautomatic adaptation, the system performance was refinedby a user to obtain semi-automatic RF results.

    Table I presents results obtained by the four methods,measured by the average precisions on the top 16 bestmatches. Evidently, the automatic RF providesconsiderable improvement over the non-adaptive CBIRmethod (i.e., by more than 25% in precision measure),without user interaction. The automatic result is =4%lower than that of user-controlled RF method. Bycombining automatic learning with user interaction, it isobserved that the semi-automatic RF clearly outperformsothers methods discussed.We also experimented on allowing the user interaction

    process to continue until convergence. It is observed thatthe user-controlled RF and the semi-automatic RF reachedthe convergence at the similar points at = 93% inprecision measure. However, in order to reach thisoptimum point, the user-controlled RF method used onaverage 2.4 cycles of user interaction, while the semi-automatic RF method used 1.6 cycles. This shows that thesemi-automatic method is the most effective learningstrategy in both retrieval accuracy and minimizing userinteraction. This also demonstrates that the application ofself-organizing RF in combination with perceptuallysignificant features extracted from the ROIs clearlyenhanced the overall system performance.One retrieval example is given in Figure 3. Figure 3(a)

    shows the 16 best-matched images obtained by the non-adaptive CBIR method, with the query image displayed atthe top left corner. From the figure, nine images arerelevant to the query "plane" shown in the top-left corner.By comparison, Figure 3(b) shows the results afterapplying self-organizing relevance feedback to capturenotion of image similarity with the ROI features, whichshows the considerably improved retrieval results by usingthe proposed method.

    (a)

    (b)Figure 3: Examples ofretrieval results on the query "plane", obtained by (a)non-adaptive CBIR method, and (b) automatic relevance feedback method.The precisions measured from (a) and (b) are: 0.56 and 0.94, respectively.

    TABLE IAVERAGE PRECISION RATE (%/6) AND NUMBER OF USER

    FEEDBACK CYCLES, OBTAINED BY RETRIEVING 20 QUERIESFROM COREL DATABASE, MEASURED FROM THE TOP 16 BESTMATCHES. PRECISION IS DEFINED AS np / 16, WHERE np IS

    THE NUMBER OF RELEVANT IMAGES RETRIEVED.

    Average # UserMethod Precision (%) FeedbackNon-adaptive CBIR 52.81 -Automatic RF 78.13 -User-controlled RF 82.19 1Semi-automatic RF 87.50 I

    1858Authorized licensed use limited to: NARESUAN UNIVERSITY. Downloaded on January 4, 2010 at 23:59 from IEEE Xplore. Restrictions apply.

  • V. CONCLUSIONS

    In the context of a relevance feedback retrievalframework, the user workload and processing timeintroduced by user-interaction are relatively high due to aseries of querying process. With the proposed automaticrelevant feedback strategy with perceptually importantfeatures extracted from the regions of interest, userinteraction can be eliminated from relevant feedbackimage retrieval. Simulation results demonstrate that theunsupervised relevance feedback learning strategy basedon the self-organizing tree map effectively utilized theperceptually important features extracted from the ROIsin relevance classification and substantially improvedretrieval accuracy without user interaction. In addition, itis observed that it is possible to find the optimal tradeoffbetween the automatic process and the number offeedback cycles required on the user in order to achievevery high retrieval accuracy.

    ACKNOWLEDGMENT

    This work is partially supported by the CanadaResearch Chair Program and by Canada Foundation forInnovations. The first author would like to thankNaresuan University for generous support.

    REFERENCES

    [1] M. Naphades, R. R. Wang, T. S. Huang, "Audio-visual query and retrieval: system that uses dynamicprogramming and relevance feedback", Journal ofElectronic Imaging, pp. 861-870, Oct., 2001.

    [2] Z. Stejic, Y. Takama, K. Hirota, "Relevancefeedback-based image retrieval interfaceincorporating region and feature saliency patterns asvisualizable image similarity criteria," IEEETransactions on Industrial Electronics, vol. 50 , issue5, pp. 839-852, Oct. 2003.

    [3] Y. Wu, Q. Tian, and T. S. Huang, "Discriminant-EMalgorithm with application to image retrieval," inProc. IEEE Conf: Computer Vision and PatternRecognition, Hilton Head Island, SC, June 2000.

    [4] X. He, 0. King, W.-Y. Ma, M. Li, and H.-J. Zhang,"Learning a Semantic Space From User's RelevanceFeedback for Image Retrieval," IEEE Transactions oncircuits and systemsfor video technology, vol. 13, no.1, pp. 39-48, January 2003.

    [5] P. Muneesawang and L. Guan, "Automatic machineinteractions for content-based image retrieval using aself-organizing tree map architecture," IEEE Trans.on Neural Networks, vol. 13, no. 4, pp. 821-834, July2002.

    [6] K. Jarrah, P. Muneesawang, I. Lee and L. Guan,"Application of image visual characterization andsoft feature selection in automatic image retrieval,"submitted to IEEE Int. Conf on Multimedia andExpo, Amsterdam, The Netherlands, July 2005.

    [7] I. Lee and L. Guan, "Content-based image retrievalwith automated relevance feedback over distributedpeer-to-peer network," Proc. IEEE Int. Symposiumon Circuits and Systems, pp. 11.5-8, Vancouver,Canada, May 2004.

    [8] W. Y. Ma and B. S. Manjunath, "Edge flow: aframework for boundary detection and imagesegmentation," Proc. IEEE International Conferenceon Computer Vision and Pattern Recognition, pp.744-749, Puerto Rico, 1997.

    [9] H.-S. Wong and L. Guan, "Characterization forperceptual importance for object-based imagesegmentation," Proc. IEEE Int. Conf: on ImageProcessing, pp. 54-57, Vancouver, 2000.

    [10]D. Mukherjee, Y. Deng and S.K. Mitra, "A region-based video coder using edge flow segmentation andhierarchical affine region matching, Proc. of SPIE,vol. 3309, 1998.

    [11] Corel Gallery Magic 65000, www.corel.com, 1999.[12] P. Muneesawang and L. Guan, "Image retrieval with

    embedded sub-class information using Gaussianmixture models" Proc. IEEE Int. Conf: onMultimedia and Expo, Maryland, USA, pp. 769-772,vol. 1, July 2003.

    1859

    Authorized licensed use limited to: NARESUAN UNIVERSITY. Downloaded on January 4, 2010 at 23:59 from IEEE Xplore. Restrictions apply.