16
Research Article Saliency Detection Using Sparse and Nonlinear Feature Representation Shahzad Anwar, 1 Qingjie Zhao, 1 Muhammad Farhan Manzoor, 2 and Saqib Ishaq Khan 3 1 Beijing Key Laboratory of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology, Beijing 100081, China 2 School of Automation, Beijing Institute of Technology, Beijing 100081, China 3 Centres of Excellence in Science and Applied Technologies, Islamabad 44000, Pakistan Correspondence should be addressed to Shahzad Anwar; [email protected] Received 17 February 2014; Accepted 11 March 2014; Published 8 May 2014 Academic Editor: Antonio Fern´ andez-Caballero Copyright © 2014 Shahzad Anwar et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. An important aspect of visual saliency detection is how features that form an input image are represented. A popular theory supports sparse feature representation, an image being represented with a basis dictionary having sparse weighting coefficient. Another method uses a nonlinear combination of image features for representation. In our work, we combine the two methods and propose a scheme that takes advantage of both sparse and nonlinear feature representation. To this end, we use independent component analysis (ICA) and covariant matrices, respectively. To compute saliency, we use a biologically plausible center surround difference (CSD) mechanism. Our sparse features are adaptive in nature; the ICA basis function are learnt at every image representation, rather than being fixed. We show that Adaptive Sparse Features when used with a CSD mechanism yield better results compared to fixed sparse representations. We also show that covariant matrices consisting of nonlinear integration of color information alone are sufficient to efficiently estimate saliency from an image. e proposed dual representation scheme is then evaluated against human eye fixation prediction, response to psychological patterns, and salient object detection on well-known datasets. We conclude that having two forms of representation compliments one another and results in better saliency detection. 1. Introduction Vision is the primary source of information that the human brain uses to understand the environment it operates in. e eyes capture light which results in information in the order of 10 9 bits every second. In order to efficiently process such huge amount of information, the brain uses visual attention to seek out only the most salient regions in the visual field. Designing an artificial system, the designer endeavors it to be maximally efficiency; real-time and computational frugal, like biological systems. us biologically inspired concepts are regularly used in designing various computational algorithms. In computer vision, a number of computational algorithms are designed based on visual attention in primates. Such visual saliency models have shown reasonable performance and are used in many applications like robot localization [1], salient object detection [2], object tracking [3], video compression [4], thumbnail generation [5], and so forth. A detailed discussion on the subject can be found in [6]. ere are two distinct neural pathways underlying visual attention in the primate brain. e top-down [6], goal driven mechanism is slow and based on learning, experience, and recall. On the other hand, the sensor driven bottom-up [6] pathway is fast and deals only with the presented stimu- lus. Many computer vision algorithms utilize a bottom-up approach to find salient features in a set of images presented to it. Using such an approach would require efficient encoding of various image variables that may represent features in the image. How an image is being represented for saliency detection is very important. e first computational model for saliency by Itti et al. [7] used color, orientation, and intensity to represent an image. ese features are inspired by the feature integration theory (FIT) [8]. Other stimulus properties that Hindawi Publishing Corporation e Scientific World Journal Volume 2014, Article ID 137349, 16 pages http://dx.doi.org/10.1155/2014/137349

Saliency Detection Using Sparse and Nonlinear …...Saliency Detection Using Sparse and Nonlinear Feature Representation ShahzadAnwar,1 QingjieZhao,1 MuhammadFarhanManzoor,2 andSaqibIshaqKhan3

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Saliency Detection Using Sparse and Nonlinear …...Saliency Detection Using Sparse and Nonlinear Feature Representation ShahzadAnwar,1 QingjieZhao,1 MuhammadFarhanManzoor,2 andSaqibIshaqKhan3

Research ArticleSaliency Detection Using Sparse and NonlinearFeature Representation

Shahzad Anwar1 Qingjie Zhao1 Muhammad Farhan Manzoor2 and Saqib Ishaq Khan3

1 Beijing Key Laboratory of Intelligent Information Technology School of Computer Science Beijing Institute of TechnologyBeijing 100081 China

2 School of Automation Beijing Institute of Technology Beijing 100081 China3 Centres of Excellence in Science and Applied Technologies Islamabad 44000 Pakistan

Correspondence should be addressed to Shahzad Anwar malikanwergmailcom

Received 17 February 2014 Accepted 11 March 2014 Published 8 May 2014

Academic Editor Antonio Fernandez-Caballero

Copyright copy 2014 Shahzad Anwar et alThis is an open access article distributed under the Creative CommonsAttribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

An important aspect of visual saliency detection is how features that form an input image are represented A popular theory supportssparse feature representation an image being represented with a basis dictionary having sparse weighting coefficient Anothermethod uses a nonlinear combination of image features for representation In our work we combine the twomethods and proposea scheme that takes advantage of both sparse and nonlinear feature representation To this end we use independent componentanalysis (ICA) and covariant matrices respectively To compute saliency we use a biologically plausible center surround difference(CSD) mechanism Our sparse features are adaptive in nature the ICA basis function are learnt at every image representationrather than being fixed We show that Adaptive Sparse Features when used with a CSDmechanism yield better results compared tofixed sparse representationsWe also show that covariantmatrices consisting of nonlinear integration of color information alone aresufficient to efficiently estimate saliency from an imageThe proposed dual representation scheme is then evaluated against humaneye fixation prediction response to psychological patterns and salient object detection on well-known datasets We conclude thathaving two forms of representation compliments one another and results in better saliency detection

1 Introduction

Vision is the primary source of information that the humanbrain uses to understand the environment it operates in Theeyes capture light which results in information in the order of109 bits every second In order to efficiently process such hugeamount of information the brain uses visual attention to seekout only themost salient regions in the visual field Designingan artificial system the designer endeavors it to bemaximallyefficiency real-time and computational frugal like biologicalsystems Thus biologically inspired concepts are regularlyused in designing various computational algorithms Incomputer vision a number of computational algorithmsare designed based on visual attention in primates Suchvisual saliency models have shown reasonable performanceand are used in many applications like robot localization[1] salient object detection [2] object tracking [3] video

compression [4] thumbnail generation [5] and so forth Adetailed discussion on the subject can be found in [6]

There are two distinct neural pathways underlying visualattention in the primate brainThe top-down [6] goal drivenmechanism is slow and based on learning experience andrecall On the other hand the sensor driven bottom-up [6]pathway is fast and deals only with the presented stimu-lus Many computer vision algorithms utilize a bottom-upapproach to find salient features in a set of images presentedto it Using such an approachwould require efficient encodingof various image variables that may represent features in theimage

How an image is being represented for saliency detectionis very important The first computational model for saliencyby Itti et al [7] used color orientation and intensity torepresent an image These features are inspired by the featureintegration theory (FIT) [8] Other stimulus properties that

Hindawi Publishing Corporatione Scientific World JournalVolume 2014 Article ID 137349 16 pageshttpdxdoiorg1011552014137349

2 The Scientific World Journal

drive visual attention could be motion occlusion like opticalflow [9] skin hue [10] texture contrast [11] wavelet [12] face[13] and gist [14 15] A summary of various features usedin saliency computation can be found in [6] Here we limitour discussion to sparse [16] and nonlinear representations[17]

Adaptive Sparse Representation A simple cell in visual cortexis characterized by its location within the field of view itsspatial frequency selection and its orientation It is believedthat the visual cortex is being evolved in such a way that itcould efficiently process natural images the kind of visualstimuli it would experience in natural conditions Learningstatistics of natural images could lead to development ofsimple cell like receptive fields Thus a number of studies[18 19] have used learningmethods alongwith natural stimulito this end For example Olshausen and Field [20] haveshown that assuming sparseness the basis functions learntfrom an ensemble of natural images can fulfill the propertiesof a simple cellrsquos receptive field

Sparse representation means a representation of the datasuch as the constituent components are rarely active Forsuch a representation a dictionary of basis is learnt froman ensemble of natural image patches with a condition thatthe respective weights of the basis coefficients are sparseand rarely active and most of the time have zero valueSparse representation is also an efficient way of process-ing images for various applications like classification facerecognition image denoising and saliency computation[16]

An important observation about human visual systems isits adaption to a new environment Several studies [21] haveshown an adaptive behavior of neurons in visual cortex Basedon such observations a saliency model utilizing an adaptivesparse representation has been proposed [22] The basis orthe dictionary for an adaptive sparse representation does notremain fixed but changes with every stimulus and thus betterrepresents the current environment AWS [23] also used anadaptation but by whitening the features as per particularimage structure

Independent component analysis (ICA) is a very populartechnique in computer vision for multilinear data analysisICA gives basis functions which are statistically independentas well as non-Gaussian The aim of an ICA algorithm is torecover independent basis from their observed linear mix-ture An image is represented using an ICA basis functionsthen the coefficients of these basis are sparse similar to neuralreceptive fields in the visual cortex [20]

In our two pronged approach one deals with an adaptivesparse representation of natural images We approach thispart using ICA basis functions learnt from individual imagesThe resulting dictionary changes as images are introducedand are hence considered to be adaptive in nature Thisrepresentation is similar to that in [22] where a globalapproach [6] based on Shannon theory [24] is used toestimate saliency Our contribution to this end is the useof a different biologically plausible mechanism to estimatesaliency in an adaptive sparse representation We later showthat an accurate representation of the stimulus leads to a

reasonably good accuracy in computing the center surrounddifference (CSD)

Nonlinear Representation In visual saliency estimation usu-ally the feature representation takes the form of a lineartransformation of the original data and various populartransformations like ICA factor analysis projection pursuitor principal component analysis (PCA) are used for thispurpose There are some biological evidences which supportthe use of nonlinear feature representations In [25] theauthors showed that the invariance property in V1 can beresult of a nonlinear operation on features The literature onnonlinear representation of features for the saliencymodelingis very limited The properties of nonlinear representationdepend on the nonlinear kernel and the features input tothat kernel In [26] local steerable kernels LSK with onlygradient information are used for such a representation Arecent approach [17] shows that all features are combinedin matrix form and then a covariance based integratednonlinear representation gives good results These nonlinearrepresentations claim to better capture the data structure thana linear representation and integrate all features to give asingle unified representation as used by both [17 26]

Inspired by the concept in [17] we choose covariancebased feature representation but we do not combine allfeatures and channels to form integrated covariancematricesInstead we rather modify the approach and show that usingonly color with spatial information integrated in a nonlinearcovariance representation performs better than all featuresintegrated nonlinearly [17] Here also CSD is utilized tocompute saliency In Section 5 we will give a comparisonto show that the proposed representation gives better eye-fixation predictions Also the use of only color informationand how it is complemented by the sparse representation willbe explained there

There is a body of literature [27] on biological plausibilityof CSD mechanism CSD means that a stimulus area isconspicuous if it is different than immediate surroundingsThis mechanism is utilized for saliency estimation in manyforms Itti et al [7] used difference-of-Gaussian (DoG) filterGao [28] utilized KL-divergence based on histograms andBorji and Itti [29] used average weighted patch dissimilarityFor our model we also dwell on average weighted patchdissimilarity [22 29 30] It is explained in Figure 1 the redwindow is the stimulus patch under consideration and inorder to compute CSD we calculate the difference (norm)with all the neighboring patches (yellowwindow) followed byweighting and averaging the difference values Here weightsdepend upon the distance between centers of two patchesSame procedure is then repeated for all the patches in theimage and thus CSD assigns higher value to patches whichare significantly different than the surroundings

To summarize in this paper restricting ourselves onlyto bottom up visual saliency our aim is to predict humaneye fixation with a saliency model that utilizes dual imagerepresentation We use both an adaptive sparse and a covari-ance based approach for image feature representation Acenter surround difference (CSD) approach is used on bothrepresentations to compute saliency maps To the best of our

The Scientific World Journal 3

Surround

Center

Figure 1 A depiction of center surround difference where thepatch in consideration is in red rectangle and the patches inyellow rectangular region are surrounding area from which thedissimilarity is checked

knowledge this is the first time that a CSD mechanism isusedwith an adaptive sparse image representationMoreoverthe proposed scheme of using only color information in non-linear form remarkably improve results Both saliency mapsare fused at a later stage to form a net saliency map whichrepresents salient features better than saliency maps from thetwo independent representations

This paper is organized as follows Related work is givenin Section 2 Section 3 covers proposed model and the math-ematical formulation of both representations and saliencycomputation Section 4 covers experimentation and resultssection A detailed discussion about the contribution in bothrepresentations along with certain necessary comparisons isgiven in Section 5 followed by conclusion in Section 6

2 Related Work

Initial visual saliency models take inspiration from featureintegration theory (FIT) [8] and guided search models [31]The first practical implementation of a saliency model basedon these theories is presented by Itti et al [7] where a numberof contrast features are learnt in parallel and fused together tomake a topographic map Later Le Meur et al [32] presentedanother cognitive model based on a contrast sensitivityfunction along with the incorporation of a CSD mechanismand perceptional decomposition There are numerous othermodels proposed which utilize different bioinspired featuresand mechanisms for saliency detection like GIST [33] PCA[30] ICA [16] histogram of local orientations symmetry[34] depth entropy [16] texture and motion [35]

Apart from cognitive models different probability basedmodels are also presented These models incorporate imagestatistics and learn probability distributions using currentimage or ensemble of images for saliency estimation Itti andBaldi [36] defined a Bayesian surprise usingKullback-Leibler(KL) distance for posterior and prior beliefs Torralba et al[33] used contextual evidence to further consolidate low levelsaliency Harel et al [37] approached the saliency problemusing the probabilistic graphical models Hou and Zhang[38] used a global approach based on Fourier transformto find saliency He proposed that the residue of original

and smoothed amplitude spectrum of a Fourier transformcontains information about salient region in an image Laterit was shown [4] that the phase of a Fourier transformcontains the essential location information rather than theamplitude There are several other models which use Fouriertransform and are classified as frequency based models [2]In these models respective color channels are individuallyprocessed for saliency detection and finally all maps arefused in a single saliency map In contrast a quaternionapproach is proposed [39] to use a unified frame work toprocess all color channels The quaternion framework alsoallows incorporation of a fourth channel like motion in avery elegant manner Some models use learning techniquesthat incorporate human eye fixation in their saliency modelsKienzle et al [40] used Human eye fixations for derivinga learned model while Tilke et al [41] trained supportvector machines (SVM) on image patches with low inter-mediate and high level features to compute saliency Apartfrom above-mentioned approaches different new techniqueslike redundancy rectangular window SNR and regressionhave shown remarkable results in saliency modeling Ina different approach Wang et al [42] proposed site rateentropy for computing saliency using framework of graphicalmodels

Some approaches use information theoretic frameworksto model saliency like Bruce and Tsotsos [16] gives theidea of information maximization (AIM) Using a biologicalstructure [20] of sparse representation and Shannonrsquos [24]formulation of self-information inverse probability of a patchin the entire image Bruce andTsotsos [16] computed saliencyThis self-information can be considered as a global measureof saliency There are various extensions of using sparserepresentations of images using a learned dictionary forsaliency computation Recently Sun et al [22] proposed thatsince biological systems are adaptive an adaptive dictionaryis more representative and thus he used principle of self-information for saliency computation using adaptive basisAWS [23] also used adaption and it works on the principlethat a statistical distance in a representative space givessaliency This representative space is computed by whiteningthe basis to the structure of a particular image This schemeuses multiscale and multistage operations on features anduses an efficient way to overcome the computation complex-ity in whitening

In [29] Borji and Itti proposed that local and globalmeasures are complementary and used both center surroundand self-information for saliency computation Moreoverthey showed that multiple color spaces are useful in bettersaliency estimation There are some saliency models [17 26]which rely on nonlinear representation of features and on theintegration of various features and channels In [26] gradientfeatures are used in a nonlinear representation based onlocal steerable kernels (LSK) for image representation whilethe author in [17] proposes a nonlinear integration usingcovariance matrices This paper also incorporates first orderimage statistics in covariance matrices to better estimatesaliency Moreover [17 26] also solve various features andrespective channels integration issue by putting forth a singleunified form

4 The Scientific World Journal

Our saliency model is inspired from mainly two typesof models sparse [16 20] and nonlinear representation [27]We propose a novel dual approach based on both sparseand nonlinear feature representation Inspired by biologicalevidence of neural receptive field properties [20] that effi-ciently process natural images in a sparse manner we use asparse image representation Moreover in order to representadaptivity of neurons to better tackle a new environment weuse an adaptive basis dictionary in an ICA approach [22]Thus our proposed method simultaneously uses sparsity andadaptivity In literature the model similar to our adaptivesparse representation is [22] In [22] Sun et al used aninformation theoretic global approach for saliency computa-tion but we use a more bioplausible local CSD for saliencycomputation Secondly we propose nonlinearly integratedrepresentation of single feature channels color along withspatial information for saliency computation Our approachis a modification of the model proposed in [17] whereall features and channels are nonlinearly integrated usingcovariance matrices although we propose that only colorinformation is enough and it can better estimate saliency inour framework Here also a CSD approach is used for saliencycomputation Finally a combined saliency map is formed byfusing the outputs given by the two representations

Contributions Major contributions of this work can besummarized as follows

(1) A novel dual image feature representation simultane-ous sparse and nonlinear feature representation

(2) CSD based saliency computation in adaptive sparserepresentation

(3) Only color based nonlinear integrated covariancerepresentation followed by CSD computation

(4) Improved results with comparison to other states ofthe models that are established by extensive testingon popular eye fixation prediction datasets and on asalient object detection dataset

3 Proposed Model

Our proposed scheme is given in Figure 2 An input imageis being simultaneously represented in sparse and nonlinearform Then saliency is computed by local center surroundoperation and finally both maps are combined to form asingle saliency map

For sparse representation we break an image into patchesand perform independent component analysis to derive basisLater these basis with sparse coefficient are used to representthe image Furthermore again after converting image intopatches we only take color information and integrate allchannels in a nonlinear fashion using covariance matricesalong with spatial information to represent an image patch

31 Mathematical Modeling In this section we cover math-ematical formulation of sparse representation nonlinearrepresentation and saliency computation Some necessarydiscussion is included to elaborate few concepts and also

some references will be given to avoid unnecessary formu-lation of the well-known concepts

Sparse Representation with ICAWe use an ICA based imagerepresentation thus an input image 119868 can be sparselyrepresented as

1198681015840

= 119860s (1)

where 119860 is a dictionary consisting of a set of basis and s con-sists of respective coefficients In our case we learn 119860 fromevery input image thus we adapt dictionary for every inputstimulusThis approach results inminimum information losswhich is a basic drawback of a fixed dictionary learned froman ensemble of images The sparse coefficients are learned byprojecting an input image to the basis such as

s = 119882119868 (2)

where

119882 = 119860minus1

(3)

The basis have dimensions the same as the patches formedfrom the input image Finally 119868

1015840 in patches form can berepresented as

1198681015840

=

1198991015840

K119896=1

1198601199011015840

119896 (4)

where 1199011015840

119896is the 119896th patchrsquos sparse coefficient vector consisting

of 1198981015840 co-efficients and there are total 119899

1015840 number of patchesin 1198681015840 Moreover K represents a function that reshape and

arranges patches at respective position to form an imageFigure 3 gives the depiction of the whole process

Nonlinear Representation with Covariance Matrices Our fea-ture matrix 119865 is based on raw RGB color space values of 119868

along with pixel position information

119865 = [119868119877 119868119866 119868119861 119909 119910] (5)

where every pixel in 119865 is a 5-dimensional vector 119891119894119894=12119901

where 119901 is the total number of pixels in the image Ourfeatures are different than those used in E Erdem andA Erdem [17] since we do not incorporate any gradientinformation in our feature matrix 119865 In (5) color along withspatial information is used rather than E Erdem and AErdem [17] approach of making a features matrix consistingof all features

The next step is nonlinear representation of 119865 usingcovariancematrices along with first order statistics [17] Tuzelet al [43] gave the concept of encoding a patch by a covariancematrix Later it was used in many applications In saliencydomain E Erdem and A Erdem [17] used patch covariancewith first order statistics for image feature representation andwe will dwell on his approach for our case Thus calculatinglocal covariance matrices for an image patch 119901

119896 we get

119862 (119901119896) =

1

119898 minus 1

119898

sum

119894=1

(119891119894minus 120583119896) (119891119894minus 120583119896)119879 (6)

The Scientific World Journal 5

Adaptive sparserepresentation

Center surroundsaliency

computation

Center surroundsaliency

computation

Image to patchesconversion

Saliency map

Color covariancenon-linear

representation

Image to patchesconversion

Π

Figure 2 Proposed model for saliency computation

Center surroundsaliency

computation

Image patches Sparse co-efficientvectors

Adaptive basis

p998400i p

998400i+1 p998400

i+2

Figure 3 Saliency computation in adaptive sparse representation

where a patch 119901119896consists of 119898 pixels with mean 120583

119896 Now

first order statistics is incorporated in the covariancematricesusing the method mentioned in [17] Then the new repre-sentation of covariance matrices with first order statisticsembedded for a patch is given by

120595119896

= 120595 (119862 (119901119896)) (7)

where function 120595 embeds first order statistics in an inputmatrix The final nonlinear feature representation of image

with 120595119896represents a 119896th patch and 119899 being total number of

patches is given by

11986810158401015840

=

119899

K119896=1

120595119896 (8)

where also K function arranges patches at respective posi-tions to form an image The whole representation is given inFigure 4Saliency Computation The saliency is computed by CSDoperation and then extended to multiple scales The CSDoperation is shown in Figure 1 where a patch under consider-ation is in red rectangle and surrounding area is highlighted

6 The Scientific World Journal

Input

RG

B

Output

Image to patchesconversion and

covariancerepresentation

First orderstatistics

embeddedPosition

information

I(x y)

Feature matrixF = [R G B x y]

Figure 4 Nonlinear representation of an input image

in yellow rectangle The saliency of red patch 119901119894 is given

by its dissimilarity between 119908 surrounding patches (yellowrectangle) as

119878 (119901119894) =

1

119908

119908

sum

119895=1

119889 (119901119894 119901119895) (9)

where dissimilarity 119889(119901119894 119901119895) between two patches is given by

119889 (119901119894 119901119895) =

10038171003817100381710038171003817120572 (119901119894) minus 120572 (119901

119895)10038171003817100381710038171003817

1 +10038171003817100381710038171003817119883119894minus 119883119895

10038171003817100381710038171003817

(10)

where119883119894and119883

119895are the central position of the patches 119901

119894and

119901119895 For the case of sparse representation we have

120572 (119901119894) = 119901

1015840

119894(11)

and for nonlinear representation we have

120572 (119901119894) = 120595119894= 120595 (119862 (119901

119894)) (12)

Thus the saliencymap for patch 119901119894derived from 119868

1015840 and 11986810158401015840 can

be given as

1198781198681015840 (119901119894) =

1

119908

119908

sum

119895=1

100381710038171003817100381710038171199011015840

119894minus 1199011015840

119895

10038171003817100381710038171003817

1 +10038171003817100381710038171003817119883119894minus 119883119895

10038171003817100381710038171003817

(13)

and with 120595(119862(119901119894)) being in vector form

11987811986810158401015840 (119901119894) =

1

119908

119908

sum

119895=1

10038171003817100381710038171003817120595 (119862 (119901

119894)) minus 120595 (119862 (119901

119895))

10038171003817100381710038171003817

1 +10038171003817100381710038171003817119883119894minus 119883119895

10038171003817100381710038171003817

(14)

The multiscale saliency by sparse approach is given by

1198781198681015840 (119909) = N(prod

1198971015840120598119871

1198781198971015840

1198681015840 (119909)) (15)

and for nonlinear integrated approach is given by

11987811986810158401015840 (119909) = N(prod

119897120598119871

119878119897

11986810158401015840 (119909)) (16)

where 1198971015840 and 119897 represent the number of scales and N shows

normalization Finally saliency map becomes

119878119868

= 119866120590

⊛ (N (1198781198681015840 lowast 11987811986810158401015840)) (17)

where119866120590represents the Gaussian smoothing by convolution

⊛ operation and lowast stands for multiplication operation

4 Experimentation

In this section we thoroughly evaluate the proposed modelwith three different experiments human eye fixation pre-diction salient-object detection and response to variouspsychological patterns The human eye fixation prediction isthe basic and necessary test to check the performance of asaliency map against the collected eye fixation from severalhuman subjects

How well a saliency map distinguishes and highlights anobject in an image shows its ability of salient object detectionThe salient object detection capability of a model is evaluatedby employing some metrics that compare the generatedsaliency map against the ground truth made by manuallabeling of the salient region in an image by human subjectsThe psychological patterns give a qualitative analysis of thesaliency model These patterns are designed to check pop-upresponses in different scenarios like orientation conjunctionand color and so forth Code (Matlab P-code) of the proposedmodel used for experimentation is available online [44]

41 Parameter Setting Before pursuing the evaluation of theproposed model we fix the parameters used to generatethe saliency maps by our model These parameters willremain the same for all the experiments Derivation ofthese parameters will be discussed in the next section afterthe introduction of the datasets and the metric used forevaluation

Sparse Representation We resize all input images to 80 times

60 pixels and use only single scale 1198971015840

= 1 for saliencycomputation Patches of 5 times 5 pixels [22] are generated withsliding overlapping window from every input image to learnthe basis for the dictionary and for the saliency computation

The Scientific World Journal 7

Table 1 Following are the input image resolution used in eachalgorithm with default parameter settings Some algorithms (atleastlowast) do internally reduce dimensions for fast computation andoptimal performance

Model Acronym ResolutionAIM [16] 12(W times H)lowastGBVS [37] W times HMESR [39] 64 times 48lowastICL [46] W times HlowastItti [37] W times HMPQFT [39] 64 times 48AWS [23] W times HlowastSDSR [26] W times HSUN [47] 12(W times H)HFT [2] 128 times 128LG [29] 256 times 256ERDM [17] 512 times 512ΔQDCT [39] 64 times48lowastProposed W times H

An ICA package available online FAST ICA [45] is used forthis experimentation

Nonlinear Representation For nonlinear image representa-tion RGB color and position information is used in onlineavailable implementation of E Erdem andA Erdem [17]Thesaliency is computed with the default parameters used in [17]that have every input image being resized to 512 times 512 pixelsand five different patch sizes 119901

119894= 8 16 32 64 128 and thus

119897 = 5 are used for saliency computationFinally normalized sparse representationrsquos saliency map

is rescaled to the size of nonlinear representationrsquos saliencymap and both maps are multiplied and normalizedThen thefinal saliencymap is resized to the actual input image size andused for experimentation The input image resolutions usedin all the saliency algorithms for experimentation are givenin Table 1

42 Human Eye Fixation Prediction In order to validatethe proposed model with human eye fixation predictionssaliency maps are generated on three datasets and for a faircomparison shuffle area under the curve score(sAUC) is usedto quantify the results

Dataset A reasonable dataset that can be used for evaluationof human eye fixation prediction must be complex anddiverse enough so that performance can be thoroughly eval-uated In literature Toronto [16] and Kootstra [34] datasetsare the most popular and widely used datasets IMSAL [2] isa relatively new dataset which we also used in our evaluation

Toronto dataset was prepared by Bruce and Tsotsos [16]and it consists of 120 images each with 681 times 511 pixels Thisdataset has both indoor and outdoor imagesThe eye fixationground truth is based on 20 subjects who free viewed theimages for few seconds

Kootstra dataset was used in [34] It consists of 101 imageseach with a resolution of 1024 times 768 pixels These imagesconsist of flowers natural scenes automans and buildingsThis dataset is significantly complex because it has manyimages with no explicit salient regions The eye fixationground truth available with this dataset is based on freeviewing of 31 subjects for a few seconds

IMSAL dataset is given by Li et al [2] it consists of 235images which are collected online through an internet searchengine and some images were taken from literature Theseimages are divided into six categories with 50 images havinglarge salient regions 80 images with intermediate salientregions 60 images with small salient regions 15 imageswith cluttered backgrounds and 15 images with repeatingdistracters These images give a good benchmark for per-formance evaluation because of the significant complexitygiven by variable size of salient objects objects with clutterand objects with distracters The accompanied ground truthconsists of both eye fixation information and binary maskscreated by human subjects who manually marked the salientobject in an image

Metric for Evaluation The most popular method to evaluatethe performance of a saliency map is to calculate area underthe curve (AUC) score of receiver operating characteristics(ROC) curve At first a saliency map is thresholded andused as a binary classifier with human eye fixations actingas positive set and some other points uniformly random asnegative set to plot an ROC curve The AUC of that ROC iscalculated andused as ameasure of performance comparison

There are various variants of AUC available in literatureand the basic difference between them is the choice ofnegative set points We will use the shuffled area under thecurve sAUC score because of its ability to cater for centerbias [41] since some models implicitly incorporate centerbias which makes a fair comparison difficult to performit is becoming standard to present results with sAUC InsAUC score the positive points consists of human subjectseye fixation on that image and the negative set consists ofall the fixation of subjects on the rest of the dataset imagesThe sAUC gives a 05 score on a center Gaussian blob whichis about the same as a random or chance score whereas allthe other versions of AUC [6] give very high score becausethey are affected by the center bias For our experimentationwe used sAUC available online by Schauerte and Stiefelhagen[39] We calculate every sAUC score for 20 times [47]and then use the mean value We found that the standarddeviation of the sAUC approximately ranges from 1119864 minus 4 to5119864 minus 4 in our experiments

Performance Analysis with Resolution In order to find theoptimal parameters for the proposed model we treat bothrepresentations separately and find the best parameters foreach representation that can be incorporated in the proposedmodel We plotted both sparse and nonlinear representationwith variable resolution on all three datasets and measurethe sAUC score Using various parameters given in Table 2Figure 5 is plotted which gives the performance of bothrepresentations on the three datasets Figure 5 shows that for

8 The Scientific World Journal

Table 2 Various scales parameters which are used for evaluation of sparse and nonlinear representation

Scale Adaptive sparse representation Nonlinear representationResolution Patch size Resolution Patch size

1 40 times 30 5 128 times 128 2 4 8 16 32

2 80 times 60 5 256 times 256 4 8 16 32 64

3 160 times 120 5 512 times 512 8 16 32 64 128

08

07

06

1 2 3

Scale

sAU

C

Sparse representation

BruceKootstra

IMSAL

(a)

Non-linear representation

075

070

065

0601 2 3

Scale

sAU

C

BruceKootstra

IMSAL

(b)

Figure 5 sAUC score for sparse and nonlinear representation The values in (a) and (b) are maximum at scale 2 and scale 3 for all threedatasets

sparse representation the performance is maximum with 80times 60 pixels (scale 2) and for nonlinear representation we getgood performance at 512 times 512 pixels (scale 3) Usually theresolution of the input images is not very high so we restrictto 512 times 512 as upper bound for nonlinear representation andsame resolution with respective patch sizes is used in [17]The image resolution and patch sizes of different scales forthe both models used for evaluation are given in Table 2Based on this analysis we incorporate the parameters of scale2 and scale 3 for sparse and nonlinear representation in theproposed saliency model

Performance Comparison with Other Models The results ofour model along with comparison with 13 state-of-the-artmethods are given in Table 3 The detailed performancewith variable Gaussian smoothing is given in Figure 6 Thesimulation codes for all these methods are taken from theauthors websites We used multiscale and quaternion basedimplementation for spectral residue (MESR) PQFT [4] andDCT [49] as proposed by Schauerte and Stiefelhagen [39]which gives higher score than original methods Erdemrsquos [17]implementation with first order statistics embedded is usedfor simulation since it gives higher sAUC score Results withthe proposed technique are quite consistent and the proposedmethod outperforms the state-of-the-artΔQDCT [39]model

on the Toronto dataset and performs comparatively well onthe Kootstra dataset and IMSAL dataset No single modelperforms well on all these datasets and the performanceof other models changes with the dataset but our modelshows consistency and remained either ranked 1 or ranked2 on these datasets We believe that the high eye fixationprediction is due to the adaptive nature of the model anddue to dual representation of features The adaptivity makesa feature space optimal for the current image thus a moreaccurate representation of the features is possible which inturn accounts for better saliency map estimation Moreovera single representation may not be enough for every caseFinally these results can improve further if we use a multi-scale representation for ICA based representation which weskipped due to computational time constraints

43 Salient Object Detection A saliency map can be usedto detect a salient object in an image The basic premiseis that if an image consists of an object which stands outfrom the rest of the image then it should be identifiedby a saliency algorithm There is a different branch invisual saliency modeling which consists of models that arespecifically designed to detect salient objects These modelsfind the salient object in an image and then segment thewhole

The Scientific World Journal 9

000 002 004 006 008

120590

072

070

068

066

064

062

060

Toronto dataset

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFTLGERDEMQDCTAWSProposed

sAU

C

010 012 014

(a)

062

060

058

056

054

052

Kootstra dataset

000 002 004 006 008

120590

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFTLGERDEMQDCTAWSProposed

sAU

C

010 012 014

(b)

078

076

074

072

070

068

066

IMSAL dataset

000 002 004 006 008

120590

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFT

ERDEMQDCTAWSProposed

sAU

C

LG

010 012 014 016

(c)

Figure 6 Toronto Kootstra and IMSAL datasets with variable Gaussian smoothing for all algorithms in comparison The 119909-axis representsthe 120590 of the smoothing Gaussian (in image width) (In (c) only GBVS LG and ΔQDCT are taken from [48])

10 The Scientific World Journal

Table3Com

paris

onof

sAUCscoreo

fthe

prop

osed

mod

elwith

13otherstate-of-the-artmod

els

Dataset

AIM

[16]

GBV

S[37]

MES

R[39]

ICL[46]

Itti[37]

MPQ

FT[39]

SDSR

[26]

SUN[47]

HFT

[2]

LG[29]

ERDM

[17]

ΔQDCT

[39]

AWS[23]

Prop

osed

TORO

NTO

[16]

0699

064

0407141

064

76064

8107119

07062

066

6106915

06939

07023

07172

07130

07179

Optim

al120590

004

001

005

003

001

004

003

003

006

004

004

005

001

004

KOOTS

TRA[34]

05901

05572

05978

05769

05689

05985

05984

05613

05911

05938

0603

05998

06185

06157

Optim

al120590

003

001

004

001

004

005

004

002

005

001

003

003

001

003

IMSA

L[2]

07424

lowast076

6507153

07364

07512

071694

07403

06955

07419lowast07356

07455

lowast07434

07468

07560

Optim

al120590

008

007

015

013

009

014

013

016

009

012

012

010

005

009

Thetop

rank

edmod

elisin

boldfont

andsecond

rank

edmod

elisin

italic

fontH

erer

esultswith

optim

alaverages

AUCof

each

metho

dwith

thec

orrespon

ding

Gaussianblur(120590)are

repo

rtedA

sin[47]w

erepeat

thes

AUCcalculationfor2

0tim

esandcompu

tethes

tand

arddeviationof

each

averages

AUC

which

ranges

from1119864minus4to5119864minus4(lowastvalues

takenfro

m[48])

The Scientific World Journal 11

Table4Com

paris

onof

AUCandPo

DSC

scoreo

fthe

prop

osed

mod

elwith

otherstate-of-the-artmod

els

Algorith

mC1

C2C3

C4C5

C6Av

gAU

CPo

DSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AvgAU

CAIM

0921

0653

0918

0543

0957

0461

0904

0435

0924

0508

0941

0671

0927

MES

R0798

0494

0854

0436

0927

0377

0718

0238

0837

0398

0918

0617

0842

ICL

0928

0681

0907

0501

0909

0348

0926

0537

0902

0498

0914

0569

0914

Itti

0939

0687

0924

0542

0952

0457

0882

0423

0924

0499

0941

0631

0927

MPQ

FT0805

0497

0865

0452

0927

0386

0735

0248

0852

0436

0921

0630

0851

SDSR

0879

0618

0908

0511

0948

0435

0801

0298

0904

0489

0933

064

50896

SUN

0766

0459

0787

0349

0880

0362

0708

0263

0719

0283

0817

0451

0780

HFT

0937

070

00923

0552

0937

044

8096

4064

80933

0554

096

0070

60942

LG0814

0515

0859

046

40904

0379

0635

0145

0862

0499

0884

0572

0826

ERDEM

0882

0617

0912

0555

0962

0526

0840

0349

0923

0575

0916

0635

0906

MQDCT

0840

0564

0894

0517

0947

0489

0806

0311

0888

0524

0923

0670

0883

AWS

0894

0625

0922

0569

0957

0480

0891

0402

0920

0534

0925

0655

0918

Prop

osed

0920

0653

094

3059

60976

0555

0934

0516

094

20532

0956

0688

094

5Th

etop

rank

edmod

elisin

bold

font

andthe2

ndrank

edmod

elisin

italic

fontTh

eoverallbestaverageA

UCscoreisg

iven

byou

rpropo

sedmod

el

12 The Scientific World Journal

Input Proposed AIM GBVS MESR ICL Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 7 Response of our algorithm and other 13 state-of-the-art algorithms on various psychological patterns

extent of the object and thus solve this task like segmentation-type binary labeling formulation [50 51] In contrast ourmodel is designed for location based (eye fixation) saliencymodeling and it is not designed to capture exact objectboundaries however by thresholding a saliency map we canget a binary map that can be used for testing performance ofa model for salient object detection Since our saliencymodelis location based we only use other location based modelsin comparison for a fair evaluation similar convention isfollowed in [2 17]

Dataset and Metric for Evaluation For salient object detec-tion the metric used by Li et al [2] is area under thecurve (AUC) and dice similarity coefficient (DSC) on IMSAL[2] dataset We will use the same metric and dataset forsalient object detectionThe DSC gives the overlap between athreshold saliency map and the ground truth Moreover thepeak value of the DSC [52] is considered an important way toestablish the best performance of an algorithm at an optimalthreshold thus we also give results with peak value of theDSC curve (PoDSC) Since AUC can be influenced by centerbias for fair comparison we turn off the center bias in allthe algorithmsThe GBVS [37] has built-in center bias and inorder to cater for that the author in [2] incorporates explicitcenter bias and shows that HFT [2] performs better thanGBVS on the same dataset which is also used in our paperWe do not employ center bias explicitly or implicitly in thepresented results and addedHFT instead for our comparisonTherefore GVBS [37] is skipped from Table 4 Furthermorewe perform Gaussian smoothing in all the algorithms tofind the optimal smoothing parameters for each class in thedataset and the optimal performance is quoted in the resultsgiven in Table 4

Performance We present results in comparison with other 12state-of-the-art algorithms The complete results are given in

Table 4 Our proposed scheme gives the best performanceon three categories C2 C3 and C5 and ranked secondon C4 and C6 Our model gives the highest average AUCscore on this dataset On different categories our results arecomparative to the HFT which is state of the art on thisdataset Apart fromHFT the performance is also significantlybetter in comparison to other algorithmsThedataset used forcomparison is quite complex but our algorithm performedwell for intermediate small objects with distracters althoughperformance on other cases is little less than other state-of-the-art algorithms

44 Psychological Patterns Wealso tested our saliencymodelon psychological patterns which are commonly used to givea qualitative performance on artificial scenarios which simu-late a pop-up phenomenonThese patterns simulate the pop-out phenomenon based on color intersection symmetryorientation curvature and candles image In order to checkthe general performance on various psychological tasks wetested the proposed model on eight psychological patternsFigure 7 gives the results of the proposed saliency modelalong with other popular models The proposed algorithmworks well on color symmetry and orientation as well as oncandle image but the performance is not good for curvatureand intersection patterns Figure 7 also shows that any singlemodel does not give good performance on all the patterns andthe best performance is more or less the same as given by theproposed scheme

5 Discussion

The image features representation drastically affects theinformation content and thus saliency estimation We useda bioinspired center surround saliency computation on twoparallel feature representations that give good performance

The Scientific World Journal 13

(a)

(b) (c)

(d) (e)

Figure 8 The fixed learned dictionary and the information lost by using such dictionary (a) Fixed learned dictionary ((b)-(c)) The inputimages ((d)-(e)) The information lost by using dictionary given in (a)

on both eye fixation prediction and salient object detectionSince a CSD operation depends on the difference between acenter portion and its surroundings a better image contentsrepresentation makes a more accurate and precise centersurround operation

We used an adaptive sparse representation to boost theperformance of the CSD operation In order to show theeffectiveness of the proposed approach we present bothquantitative and qualitative results For qualitative compar-ison we use a fix dictionary [29] (Figure 8(a)) learnt from

an ensemble of natural images We show that some of theinformation is lost if we use a fixed dictionary to represent animage and usually the lost information belongs to the salientregion in an image The difference between an input imageand the image reconstructed by a fixed dictionary is givenin Figure 8 The red cylinder Figure 8(b) and red box withtext Figure 8(c) is visible in the image plots Figures 8(d)and 8(e) These two objects are the salient features in bothimages and their appearance in the residual image shows thatcurrent representation which uses a fixed dictionary loses

14 The Scientific World Journal

Input Gnd truth Proposed AIM GBVS Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 9 Comparison of our method with other state-of-the-art saliency models The first column is the input image with second columnbeing ground truth followed by our proposed saliency model Results from all the other models are plotted in the same row

Table 5 sAUC score comparison using fixed [29] and adaptivesparse representation (proposed) for CSD saliency

DatasetBorji local [29] Fixeddictionary sparserepresentation

Proposed Adaptivedictionary sparserepresentation

Toronto 0670 07110Kootstra 0572 06061

Table 6 sAUC comparison of E Erdem and A Erdem [17] and theproposed color based nonlinear integration

Dataset E Erdem andA Erdem [17]

Proposed Nonlinearrepresentation

Bruce and Tsotsos [16] 07023 07084Kootstra et al [34] 0603 06152

some information that belong to salient portion of the inputimage

For quantitative comparison we employ sAUC to dothe comparison between saliency maps based on a fixeddictionary and the adaptive basis dictionary Table 5 gives acomparison of the both approaches on two datasets Torontoand Kootstra The performance difference is quite significantand it shows that an adaptive representation is much betterBased on these qualitative and quantitative results we canconclude that an adaptive image presentation is more viableand accurate for CSD based saliency computation

A model proposed in [17] gives an idea of nonlinearintegration of all features by covariance matrices and thesupported implementation uses color gradient and spatialinformation Our second contribution is the modification ofthat model and a proposal of using only color with spatial

information in nonlinearly integrated manner using samecovariance matrices For our proposed implementation(seeFigure 4) we modified the features used in [17] and comparethe saliencymap with Erdemrsquos [17] model in Table 6 using thesAUC score Two databases Toronto [16] and Kootstra [34]are used for simulations and the results indicate that by usingonly color with spatial information we can get better sAUCscore than integrating all features using covariance matricesIn Table 6 the difference in sAUC score is quite visible onboth datasets One possible reason of this improvement maybe that the correlation among different features like color andorientation is different and thus using a covariance basedrepresentation does not capture the underlying informationstructure in an efficient way as compared to when only colorinformation is used

One possible argument against our usage of only colorinformation however can be that without any gradient ororientation information a saliency model will fail to detectmany salient regions This argument can also supported bynature since neurons tuned to orientation in an image areknown to contribute to saliency computation [53] In ourmodel this issue is addressed by the sparse representationwhere the adaptive basis same as basis shown in Figure 8(a)is Gabor like filters with edge like structure and thus thesebases efficiently capture orientation information from theimage which complements our color information in nonlin-ear representation

Finally the sAUC score of dual representation Table 3shows that we achieve better eye fixation prediction thantreating both representations separately as shown in Tables5 and 6We believe that such improvement is due to the com-plementary behavior of both techniques since a combinedapproach better represents image contents with high fidelityand thus that in turn improves saliency detection Lastly for

The Scientific World Journal 15

illustration and visual comparison we present some saliencymaps produced by our algorithm along with other models inFigure 9

6 Conclusion

This paper shows that dual feature representationmanages torobustly capture image information which can be used in acenter surround operation to compute saliencyWe show thata CSD on adaptive sparse basis gives better results than a fixsparse basis representation In nonlinear representation weshow that nonlinearly integrated color channels with spatialinformation better capture underlying data structure andthus a CSD on such representation gives good results Finallywe consider both representations as complementary and thusa fused saliencymap not only give good results on human eyefixations but also detect salient objects with high accuracy

In future wewill incorporate some top-downmechanismto better imitative human saliency computation capabilitiesbased on learning and experience Another possible exten-sion of the existing work is to test dynamic scenes video byincorporating additional motion information in the currentscheme

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the national Natural Sci-ence Foundation of China (61175096) and Specialized Fundfor Joint Building Program of Beijing Municipal EducationCommission

References

[1] C Siagian and L Itti ldquoBiologically inspired mobile robot visionlocalizationrdquo IEEE Transactions on Robotics vol 25 no 4 pp861ndash873 2009

[2] J Li M D Levine X An X Xu and H He ldquoVisual saliencybased on scale-space analysis in the frequency domainrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 4 pp 996ndash1010 2012

[3] Y Su Q Zhao L Zhao and D Gu ldquoAbrupt motion trackingusing a visual saliency embedded particle filterrdquo Pattern Recog-nition vol 47 no 5 pp 1826ndash1834 2014

[4] C Guo Q Ma and L Zhang ldquoSpatio-temporal saliency detec-tion using phase spectrum of quaternion fourier transformrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo08) pp 1ndash8 June 2008

[5] X Hou and L Zhang ldquoThumbnail generation based on globalsaliencyrdquo in Advances in Cognitive NeurodynamicsmdashICCN2007 pp 999ndash1003 Springer Amsterdam The Netherlands2007

[6] A Borji and L Itti ldquoState-of-the-art in visual attention mod-elingrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 35 no 1 pp 185ndash207 2013

[7] L Itti C Koch and E Niebur ldquoAmodel of saliency-based visualattention for rapid scene analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 20 no 11 pp 1254ndash12591998

[8] AM Treisman ldquoFeature integration theoryrdquoCognitive Psychol-ogy vol 12 no 1 pp 97ndash136 1980

[9] J K Tsotsos S M Culhane W Y Kei Wai Y Lai N Davisand F Nuflo ldquoModeling visual attention via selective tuningrdquoArtificial Intelligence vol 78 no 1-2 pp 507ndash545 1995

[10] R Rae Gestikbasierte Mensch-Maschine-Kommunikation aufder grundlage visueller aufmerksamkeit und adaptivitat [PhDthesis] Universitat Bielefeld 2000

[11] D Parkhurst K Law and E Niebur ldquoModeling the role ofsalience in the allocation of overt visual attentionrdquo VisionResearch vol 42 no 1 pp 107ndash123 2002

[12] J Li Y Tian T Huang and W Gao ldquoProbabilistic multi-tasklearning for visual saliency estimation in videordquo InternationalJournal of Computer Vision vol 90 no 2 pp 150ndash165 2010

[13] MCerf J HarelW Einhauser andCKoch ldquoPredicting humangaze using low-level saliency combined with face detectionrdquo inAdvances in Neural Information Processing Systems vol 20 pp241ndash248 2007

[14] A Torralba ldquoModeling global scene factors in attentionrdquoJournal of the Optical Society of America A Optics and ImageScience and Vision vol 20 no 7 pp 1407ndash1418 2003

[15] A Oliva and A Torralba ldquoModeling the shape of the scenea holistic representation of the spatial enveloperdquo InternationalJournal of Computer Vision vol 42 no 3 pp 145ndash175 2001

[16] N D B Bruce and J K Tsotsos ldquoSaliency attention and visualsearch an information theoretic approachrdquo Journal of Visionvol 9 no 3 article 5 2009

[17] E Erdem and A Erdem ldquoVisual saliency estimation by nonlin-early integrating features using region covariancesrdquo Journal ofVision vol 13 no 4 article 11 2013

[18] J H van Hateren ldquoReal and optimal neural images in earlyvisionrdquo Nature vol 360 no 6399 pp 68ndash70 1992

[19] D J Field ldquoRelations between the statistics of natural imagesand the response properties of cortical cellsrdquo Journal of theOptical Society of America A Optics and Image Science vol 4no 12 pp 2379ndash2394 1987

[20] B A Olshausen and D J Field ldquoNatural image statistics andefficient codingrdquo Network Computation in Neural Systems vol7 no 2 pp 333ndash339 1996

[21] M Kwon G Legge F Fang A Cheong and S He ldquoIdentifyingthe mechanism of adaptation to prolonged contrast reductionrdquoJournal of Vision vol 9 no 8 p 976 2009

[22] X Sun H Yao and R Ji ldquoVisual attention modeling based onshort-term environmental adaptionrdquo Journal of Visual Commu-nication and Image Representation vol 24 no 2 pp 171ndash1802013

[23] A Garcia-Diaz X R Fdez-Vidal X M Pardo and R DosilldquoSaliency from hierarchical adaptation through decorrelationand variance normalizationrdquo Image and Vision Computing vol30 no 1 pp 51ndash64 2012

[24] C E Shannon ldquoAmathematical theory of communicationrdquo BellSystem Technical Journal vol 27 no 3 pp 379ndash423 1948

[25] Y Karklin and M S Lewicki ldquoEmergence of complex cellproperties by learning to generalize in natural scenesrdquo Naturevol 457 no 7225 pp 83ndash86 2009

[26] H J Seo and P Milanfar ldquoStatic and space-time visual saliencydetection by self-resemblancerdquo Journal of Vision vol 9 no 12pp 1ndash27 2009

16 The Scientific World Journal

[27] D J Jobson Z-U Rahman and G AWoodell ldquoProperties andperformance of a centersurround retinexrdquo IEEE Transactionson Image Processing vol 6 no 3 pp 451ndash462 1997

[28] A Guo D Zhao S Liu X Fan and W Gao ldquoVisual attentionbased image quality assessmentrdquo in Proceedings of the 18thIEEE International Conference on Image Processing (ICIP rsquo11) pp3297ndash3300 September 2011

[29] A Borji and L Itti ldquoExploiting local and global patch raritiesfor saliency detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo12) pp 478ndash485 2012

[30] L Duan C Wu J Miao L Qing and Y Fu ldquoVisual saliencydetection by spatially weighted dissimilarityrdquo in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR rsquo11) pp 473ndash480 June 2011

[31] JMWolfe ldquoGuided search 20 a revisedmodel of visual searchrdquoPsychonomic Bulletin amp Review vol 1 no 2 pp 202ndash238 1994

[32] O Le Meur P Le Callet D Barba and DThoreau ldquoA coherentcomputational approach to model bottom-up visual attentionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 28 no 5 pp 802ndash817 2006

[33] A Torralba A Oliva M S Castelhano and J M HendersonldquoContextual guidance of eye movements and attention in real-world scenes the role of global features in object searchrdquoPsychological Review vol 113 no 4 pp 766ndash786 2006

[34] G Kootstra N Bergstrom and D Kragic ldquoUsing symmetry toselect fixation points for segmentationrdquo in Proceedings of the20th International Conference on Pattern Recognition (ICPR rsquo10)pp 3894ndash3897 August 2010

[35] C Li J Xue N Zheng X Lan and Z Tian ldquoSpatio-temporalsaliency perception via hypercomplex frequency spectral con-trastrdquo Sensors vol 13 no 3 pp 3409ndash3431 2013

[36] L Itti and P Baldi ldquoBayesian surprise attracts human attentionrdquoVision Research vol 49 no 10 pp 1295ndash1306 2009

[37] J Harel C Koch and P Perona ldquoGraph-based visual saliencyrdquoin Advances in Neural Information Processing Systems vol 19pp 545ndash552 2006

[38] X Hou and L Zhang ldquoSaliency detection a spectral residualapproachrdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo07)pp 1ndash8 June 2007

[39] B Schauerte and R Stiefelhagen ldquoQuaternion-based spectralsaliency detection for eye fixation predictionrdquo in ComputerVisionmdashECCV 2012 A Fitzgibbon S Lazebnik P Perona YSato and C Schmid Eds Lecture Notes in Computer Sciencepp 116ndash129 2012

[40] W Kienzle F Wichmann B Scholkopf and M Franz ANonparametric Approach to Bottom-Up Visual Saliency 2007

[41] J Tilke K Ehinger F Durand and A Torralba ldquoLearningto predict where humans lookrdquo in Proceedings of the 12thInternational Conference on Computer Vision (ICCV rsquo09) pp2106ndash2113 October 2009

[42] W Wang Y Wang Q Huang and W Gao ldquoMeasuring visualsaliency by site entropy raterdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 2368ndash2375 June 2010

[43] O Tuzel F Porikli and P Meer ldquoRegion covariance afast descriptor for detection and classificationrdquo in ComputerVisionmdashECCV 2006 vol 3952 of Lecture Notes in ComputerScience pp 589ndash600 2006

[44] httpssitesgooglecomsitesparsenonlinearsaliencymodelhomedownloads

[45] A Hyvarinen and E Oja ldquoA fast fixed-point algorithm forindependent component analysisrdquo Neural Computation vol 9no 7 pp 1483ndash1492 1997

[46] X Hou and L Zhang ldquoDynamic visual attention searching forcoding length incrementsrdquo in Advances in Neural InformationProcessing Systems vol 21 pp 681ndash688 2008

[47] L Zhang M H Tong T K Marks H Shan and G WCottrell ldquoSUN a Bayesian framework for saliency using naturalstatisticsrdquo Journal of Vision vol 8 no 7 article 32 pp 1ndash202008

[48] J Zhang and S Sclaroff ldquoSaliency detection a boolean mapapproachrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV rsquo13) 2013

[49] X Hou J Harel and C Koch ldquoImage signature highlightingsparse salient regionsrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 34 no 1 pp 194ndash201 2012

[50] E Rahtu J Kannala M Salo and J Heikkila ldquoSegmentingsalient objects from images and videosrdquo in Computer VisionmdashECCV 2010 vol 6315 of Lecture Notes in Computer Science pp366ndash379 2010

[51] M-M Cheng G-X Zhang N J Mitra X Huang and S-M Hu ldquoGlobal contrast based salient region detectionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo11) pp 409ndash416 June 2011

[52] T Veit J-P Tarel P Nicolle and P Charbonnier ldquoEvaluationof road marking feature extractionrdquo in Proceedings of the11th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo08) pp 174ndash181 December 2008

[53] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model of simple receptive fields in catstriate cortexrdquo Journal of Neurophysiology vol 58 no 6 pp1233ndash1258 1987

Page 2: Saliency Detection Using Sparse and Nonlinear …...Saliency Detection Using Sparse and Nonlinear Feature Representation ShahzadAnwar,1 QingjieZhao,1 MuhammadFarhanManzoor,2 andSaqibIshaqKhan3

2 The Scientific World Journal

drive visual attention could be motion occlusion like opticalflow [9] skin hue [10] texture contrast [11] wavelet [12] face[13] and gist [14 15] A summary of various features usedin saliency computation can be found in [6] Here we limitour discussion to sparse [16] and nonlinear representations[17]

Adaptive Sparse Representation A simple cell in visual cortexis characterized by its location within the field of view itsspatial frequency selection and its orientation It is believedthat the visual cortex is being evolved in such a way that itcould efficiently process natural images the kind of visualstimuli it would experience in natural conditions Learningstatistics of natural images could lead to development ofsimple cell like receptive fields Thus a number of studies[18 19] have used learningmethods alongwith natural stimulito this end For example Olshausen and Field [20] haveshown that assuming sparseness the basis functions learntfrom an ensemble of natural images can fulfill the propertiesof a simple cellrsquos receptive field

Sparse representation means a representation of the datasuch as the constituent components are rarely active Forsuch a representation a dictionary of basis is learnt froman ensemble of natural image patches with a condition thatthe respective weights of the basis coefficients are sparseand rarely active and most of the time have zero valueSparse representation is also an efficient way of process-ing images for various applications like classification facerecognition image denoising and saliency computation[16]

An important observation about human visual systems isits adaption to a new environment Several studies [21] haveshown an adaptive behavior of neurons in visual cortex Basedon such observations a saliency model utilizing an adaptivesparse representation has been proposed [22] The basis orthe dictionary for an adaptive sparse representation does notremain fixed but changes with every stimulus and thus betterrepresents the current environment AWS [23] also used anadaptation but by whitening the features as per particularimage structure

Independent component analysis (ICA) is a very populartechnique in computer vision for multilinear data analysisICA gives basis functions which are statistically independentas well as non-Gaussian The aim of an ICA algorithm is torecover independent basis from their observed linear mix-ture An image is represented using an ICA basis functionsthen the coefficients of these basis are sparse similar to neuralreceptive fields in the visual cortex [20]

In our two pronged approach one deals with an adaptivesparse representation of natural images We approach thispart using ICA basis functions learnt from individual imagesThe resulting dictionary changes as images are introducedand are hence considered to be adaptive in nature Thisrepresentation is similar to that in [22] where a globalapproach [6] based on Shannon theory [24] is used toestimate saliency Our contribution to this end is the useof a different biologically plausible mechanism to estimatesaliency in an adaptive sparse representation We later showthat an accurate representation of the stimulus leads to a

reasonably good accuracy in computing the center surrounddifference (CSD)

Nonlinear Representation In visual saliency estimation usu-ally the feature representation takes the form of a lineartransformation of the original data and various populartransformations like ICA factor analysis projection pursuitor principal component analysis (PCA) are used for thispurpose There are some biological evidences which supportthe use of nonlinear feature representations In [25] theauthors showed that the invariance property in V1 can beresult of a nonlinear operation on features The literature onnonlinear representation of features for the saliencymodelingis very limited The properties of nonlinear representationdepend on the nonlinear kernel and the features input tothat kernel In [26] local steerable kernels LSK with onlygradient information are used for such a representation Arecent approach [17] shows that all features are combinedin matrix form and then a covariance based integratednonlinear representation gives good results These nonlinearrepresentations claim to better capture the data structure thana linear representation and integrate all features to give asingle unified representation as used by both [17 26]

Inspired by the concept in [17] we choose covariancebased feature representation but we do not combine allfeatures and channels to form integrated covariancematricesInstead we rather modify the approach and show that usingonly color with spatial information integrated in a nonlinearcovariance representation performs better than all featuresintegrated nonlinearly [17] Here also CSD is utilized tocompute saliency In Section 5 we will give a comparisonto show that the proposed representation gives better eye-fixation predictions Also the use of only color informationand how it is complemented by the sparse representation willbe explained there

There is a body of literature [27] on biological plausibilityof CSD mechanism CSD means that a stimulus area isconspicuous if it is different than immediate surroundingsThis mechanism is utilized for saliency estimation in manyforms Itti et al [7] used difference-of-Gaussian (DoG) filterGao [28] utilized KL-divergence based on histograms andBorji and Itti [29] used average weighted patch dissimilarityFor our model we also dwell on average weighted patchdissimilarity [22 29 30] It is explained in Figure 1 the redwindow is the stimulus patch under consideration and inorder to compute CSD we calculate the difference (norm)with all the neighboring patches (yellowwindow) followed byweighting and averaging the difference values Here weightsdepend upon the distance between centers of two patchesSame procedure is then repeated for all the patches in theimage and thus CSD assigns higher value to patches whichare significantly different than the surroundings

To summarize in this paper restricting ourselves onlyto bottom up visual saliency our aim is to predict humaneye fixation with a saliency model that utilizes dual imagerepresentation We use both an adaptive sparse and a covari-ance based approach for image feature representation Acenter surround difference (CSD) approach is used on bothrepresentations to compute saliency maps To the best of our

The Scientific World Journal 3

Surround

Center

Figure 1 A depiction of center surround difference where thepatch in consideration is in red rectangle and the patches inyellow rectangular region are surrounding area from which thedissimilarity is checked

knowledge this is the first time that a CSD mechanism isusedwith an adaptive sparse image representationMoreoverthe proposed scheme of using only color information in non-linear form remarkably improve results Both saliency mapsare fused at a later stage to form a net saliency map whichrepresents salient features better than saliency maps from thetwo independent representations

This paper is organized as follows Related work is givenin Section 2 Section 3 covers proposed model and the math-ematical formulation of both representations and saliencycomputation Section 4 covers experimentation and resultssection A detailed discussion about the contribution in bothrepresentations along with certain necessary comparisons isgiven in Section 5 followed by conclusion in Section 6

2 Related Work

Initial visual saliency models take inspiration from featureintegration theory (FIT) [8] and guided search models [31]The first practical implementation of a saliency model basedon these theories is presented by Itti et al [7] where a numberof contrast features are learnt in parallel and fused together tomake a topographic map Later Le Meur et al [32] presentedanother cognitive model based on a contrast sensitivityfunction along with the incorporation of a CSD mechanismand perceptional decomposition There are numerous othermodels proposed which utilize different bioinspired featuresand mechanisms for saliency detection like GIST [33] PCA[30] ICA [16] histogram of local orientations symmetry[34] depth entropy [16] texture and motion [35]

Apart from cognitive models different probability basedmodels are also presented These models incorporate imagestatistics and learn probability distributions using currentimage or ensemble of images for saliency estimation Itti andBaldi [36] defined a Bayesian surprise usingKullback-Leibler(KL) distance for posterior and prior beliefs Torralba et al[33] used contextual evidence to further consolidate low levelsaliency Harel et al [37] approached the saliency problemusing the probabilistic graphical models Hou and Zhang[38] used a global approach based on Fourier transformto find saliency He proposed that the residue of original

and smoothed amplitude spectrum of a Fourier transformcontains information about salient region in an image Laterit was shown [4] that the phase of a Fourier transformcontains the essential location information rather than theamplitude There are several other models which use Fouriertransform and are classified as frequency based models [2]In these models respective color channels are individuallyprocessed for saliency detection and finally all maps arefused in a single saliency map In contrast a quaternionapproach is proposed [39] to use a unified frame work toprocess all color channels The quaternion framework alsoallows incorporation of a fourth channel like motion in avery elegant manner Some models use learning techniquesthat incorporate human eye fixation in their saliency modelsKienzle et al [40] used Human eye fixations for derivinga learned model while Tilke et al [41] trained supportvector machines (SVM) on image patches with low inter-mediate and high level features to compute saliency Apartfrom above-mentioned approaches different new techniqueslike redundancy rectangular window SNR and regressionhave shown remarkable results in saliency modeling Ina different approach Wang et al [42] proposed site rateentropy for computing saliency using framework of graphicalmodels

Some approaches use information theoretic frameworksto model saliency like Bruce and Tsotsos [16] gives theidea of information maximization (AIM) Using a biologicalstructure [20] of sparse representation and Shannonrsquos [24]formulation of self-information inverse probability of a patchin the entire image Bruce andTsotsos [16] computed saliencyThis self-information can be considered as a global measureof saliency There are various extensions of using sparserepresentations of images using a learned dictionary forsaliency computation Recently Sun et al [22] proposed thatsince biological systems are adaptive an adaptive dictionaryis more representative and thus he used principle of self-information for saliency computation using adaptive basisAWS [23] also used adaption and it works on the principlethat a statistical distance in a representative space givessaliency This representative space is computed by whiteningthe basis to the structure of a particular image This schemeuses multiscale and multistage operations on features anduses an efficient way to overcome the computation complex-ity in whitening

In [29] Borji and Itti proposed that local and globalmeasures are complementary and used both center surroundand self-information for saliency computation Moreoverthey showed that multiple color spaces are useful in bettersaliency estimation There are some saliency models [17 26]which rely on nonlinear representation of features and on theintegration of various features and channels In [26] gradientfeatures are used in a nonlinear representation based onlocal steerable kernels (LSK) for image representation whilethe author in [17] proposes a nonlinear integration usingcovariance matrices This paper also incorporates first orderimage statistics in covariance matrices to better estimatesaliency Moreover [17 26] also solve various features andrespective channels integration issue by putting forth a singleunified form

4 The Scientific World Journal

Our saliency model is inspired from mainly two typesof models sparse [16 20] and nonlinear representation [27]We propose a novel dual approach based on both sparseand nonlinear feature representation Inspired by biologicalevidence of neural receptive field properties [20] that effi-ciently process natural images in a sparse manner we use asparse image representation Moreover in order to representadaptivity of neurons to better tackle a new environment weuse an adaptive basis dictionary in an ICA approach [22]Thus our proposed method simultaneously uses sparsity andadaptivity In literature the model similar to our adaptivesparse representation is [22] In [22] Sun et al used aninformation theoretic global approach for saliency computa-tion but we use a more bioplausible local CSD for saliencycomputation Secondly we propose nonlinearly integratedrepresentation of single feature channels color along withspatial information for saliency computation Our approachis a modification of the model proposed in [17] whereall features and channels are nonlinearly integrated usingcovariance matrices although we propose that only colorinformation is enough and it can better estimate saliency inour framework Here also a CSD approach is used for saliencycomputation Finally a combined saliency map is formed byfusing the outputs given by the two representations

Contributions Major contributions of this work can besummarized as follows

(1) A novel dual image feature representation simultane-ous sparse and nonlinear feature representation

(2) CSD based saliency computation in adaptive sparserepresentation

(3) Only color based nonlinear integrated covariancerepresentation followed by CSD computation

(4) Improved results with comparison to other states ofthe models that are established by extensive testingon popular eye fixation prediction datasets and on asalient object detection dataset

3 Proposed Model

Our proposed scheme is given in Figure 2 An input imageis being simultaneously represented in sparse and nonlinearform Then saliency is computed by local center surroundoperation and finally both maps are combined to form asingle saliency map

For sparse representation we break an image into patchesand perform independent component analysis to derive basisLater these basis with sparse coefficient are used to representthe image Furthermore again after converting image intopatches we only take color information and integrate allchannels in a nonlinear fashion using covariance matricesalong with spatial information to represent an image patch

31 Mathematical Modeling In this section we cover math-ematical formulation of sparse representation nonlinearrepresentation and saliency computation Some necessarydiscussion is included to elaborate few concepts and also

some references will be given to avoid unnecessary formu-lation of the well-known concepts

Sparse Representation with ICAWe use an ICA based imagerepresentation thus an input image 119868 can be sparselyrepresented as

1198681015840

= 119860s (1)

where 119860 is a dictionary consisting of a set of basis and s con-sists of respective coefficients In our case we learn 119860 fromevery input image thus we adapt dictionary for every inputstimulusThis approach results inminimum information losswhich is a basic drawback of a fixed dictionary learned froman ensemble of images The sparse coefficients are learned byprojecting an input image to the basis such as

s = 119882119868 (2)

where

119882 = 119860minus1

(3)

The basis have dimensions the same as the patches formedfrom the input image Finally 119868

1015840 in patches form can berepresented as

1198681015840

=

1198991015840

K119896=1

1198601199011015840

119896 (4)

where 1199011015840

119896is the 119896th patchrsquos sparse coefficient vector consisting

of 1198981015840 co-efficients and there are total 119899

1015840 number of patchesin 1198681015840 Moreover K represents a function that reshape and

arranges patches at respective position to form an imageFigure 3 gives the depiction of the whole process

Nonlinear Representation with Covariance Matrices Our fea-ture matrix 119865 is based on raw RGB color space values of 119868

along with pixel position information

119865 = [119868119877 119868119866 119868119861 119909 119910] (5)

where every pixel in 119865 is a 5-dimensional vector 119891119894119894=12119901

where 119901 is the total number of pixels in the image Ourfeatures are different than those used in E Erdem andA Erdem [17] since we do not incorporate any gradientinformation in our feature matrix 119865 In (5) color along withspatial information is used rather than E Erdem and AErdem [17] approach of making a features matrix consistingof all features

The next step is nonlinear representation of 119865 usingcovariancematrices along with first order statistics [17] Tuzelet al [43] gave the concept of encoding a patch by a covariancematrix Later it was used in many applications In saliencydomain E Erdem and A Erdem [17] used patch covariancewith first order statistics for image feature representation andwe will dwell on his approach for our case Thus calculatinglocal covariance matrices for an image patch 119901

119896 we get

119862 (119901119896) =

1

119898 minus 1

119898

sum

119894=1

(119891119894minus 120583119896) (119891119894minus 120583119896)119879 (6)

The Scientific World Journal 5

Adaptive sparserepresentation

Center surroundsaliency

computation

Center surroundsaliency

computation

Image to patchesconversion

Saliency map

Color covariancenon-linear

representation

Image to patchesconversion

Π

Figure 2 Proposed model for saliency computation

Center surroundsaliency

computation

Image patches Sparse co-efficientvectors

Adaptive basis

p998400i p

998400i+1 p998400

i+2

Figure 3 Saliency computation in adaptive sparse representation

where a patch 119901119896consists of 119898 pixels with mean 120583

119896 Now

first order statistics is incorporated in the covariancematricesusing the method mentioned in [17] Then the new repre-sentation of covariance matrices with first order statisticsembedded for a patch is given by

120595119896

= 120595 (119862 (119901119896)) (7)

where function 120595 embeds first order statistics in an inputmatrix The final nonlinear feature representation of image

with 120595119896represents a 119896th patch and 119899 being total number of

patches is given by

11986810158401015840

=

119899

K119896=1

120595119896 (8)

where also K function arranges patches at respective posi-tions to form an image The whole representation is given inFigure 4Saliency Computation The saliency is computed by CSDoperation and then extended to multiple scales The CSDoperation is shown in Figure 1 where a patch under consider-ation is in red rectangle and surrounding area is highlighted

6 The Scientific World Journal

Input

RG

B

Output

Image to patchesconversion and

covariancerepresentation

First orderstatistics

embeddedPosition

information

I(x y)

Feature matrixF = [R G B x y]

Figure 4 Nonlinear representation of an input image

in yellow rectangle The saliency of red patch 119901119894 is given

by its dissimilarity between 119908 surrounding patches (yellowrectangle) as

119878 (119901119894) =

1

119908

119908

sum

119895=1

119889 (119901119894 119901119895) (9)

where dissimilarity 119889(119901119894 119901119895) between two patches is given by

119889 (119901119894 119901119895) =

10038171003817100381710038171003817120572 (119901119894) minus 120572 (119901

119895)10038171003817100381710038171003817

1 +10038171003817100381710038171003817119883119894minus 119883119895

10038171003817100381710038171003817

(10)

where119883119894and119883

119895are the central position of the patches 119901

119894and

119901119895 For the case of sparse representation we have

120572 (119901119894) = 119901

1015840

119894(11)

and for nonlinear representation we have

120572 (119901119894) = 120595119894= 120595 (119862 (119901

119894)) (12)

Thus the saliencymap for patch 119901119894derived from 119868

1015840 and 11986810158401015840 can

be given as

1198781198681015840 (119901119894) =

1

119908

119908

sum

119895=1

100381710038171003817100381710038171199011015840

119894minus 1199011015840

119895

10038171003817100381710038171003817

1 +10038171003817100381710038171003817119883119894minus 119883119895

10038171003817100381710038171003817

(13)

and with 120595(119862(119901119894)) being in vector form

11987811986810158401015840 (119901119894) =

1

119908

119908

sum

119895=1

10038171003817100381710038171003817120595 (119862 (119901

119894)) minus 120595 (119862 (119901

119895))

10038171003817100381710038171003817

1 +10038171003817100381710038171003817119883119894minus 119883119895

10038171003817100381710038171003817

(14)

The multiscale saliency by sparse approach is given by

1198781198681015840 (119909) = N(prod

1198971015840120598119871

1198781198971015840

1198681015840 (119909)) (15)

and for nonlinear integrated approach is given by

11987811986810158401015840 (119909) = N(prod

119897120598119871

119878119897

11986810158401015840 (119909)) (16)

where 1198971015840 and 119897 represent the number of scales and N shows

normalization Finally saliency map becomes

119878119868

= 119866120590

⊛ (N (1198781198681015840 lowast 11987811986810158401015840)) (17)

where119866120590represents the Gaussian smoothing by convolution

⊛ operation and lowast stands for multiplication operation

4 Experimentation

In this section we thoroughly evaluate the proposed modelwith three different experiments human eye fixation pre-diction salient-object detection and response to variouspsychological patterns The human eye fixation prediction isthe basic and necessary test to check the performance of asaliency map against the collected eye fixation from severalhuman subjects

How well a saliency map distinguishes and highlights anobject in an image shows its ability of salient object detectionThe salient object detection capability of a model is evaluatedby employing some metrics that compare the generatedsaliency map against the ground truth made by manuallabeling of the salient region in an image by human subjectsThe psychological patterns give a qualitative analysis of thesaliency model These patterns are designed to check pop-upresponses in different scenarios like orientation conjunctionand color and so forth Code (Matlab P-code) of the proposedmodel used for experimentation is available online [44]

41 Parameter Setting Before pursuing the evaluation of theproposed model we fix the parameters used to generatethe saliency maps by our model These parameters willremain the same for all the experiments Derivation ofthese parameters will be discussed in the next section afterthe introduction of the datasets and the metric used forevaluation

Sparse Representation We resize all input images to 80 times

60 pixels and use only single scale 1198971015840

= 1 for saliencycomputation Patches of 5 times 5 pixels [22] are generated withsliding overlapping window from every input image to learnthe basis for the dictionary and for the saliency computation

The Scientific World Journal 7

Table 1 Following are the input image resolution used in eachalgorithm with default parameter settings Some algorithms (atleastlowast) do internally reduce dimensions for fast computation andoptimal performance

Model Acronym ResolutionAIM [16] 12(W times H)lowastGBVS [37] W times HMESR [39] 64 times 48lowastICL [46] W times HlowastItti [37] W times HMPQFT [39] 64 times 48AWS [23] W times HlowastSDSR [26] W times HSUN [47] 12(W times H)HFT [2] 128 times 128LG [29] 256 times 256ERDM [17] 512 times 512ΔQDCT [39] 64 times48lowastProposed W times H

An ICA package available online FAST ICA [45] is used forthis experimentation

Nonlinear Representation For nonlinear image representa-tion RGB color and position information is used in onlineavailable implementation of E Erdem andA Erdem [17]Thesaliency is computed with the default parameters used in [17]that have every input image being resized to 512 times 512 pixelsand five different patch sizes 119901

119894= 8 16 32 64 128 and thus

119897 = 5 are used for saliency computationFinally normalized sparse representationrsquos saliency map

is rescaled to the size of nonlinear representationrsquos saliencymap and both maps are multiplied and normalizedThen thefinal saliencymap is resized to the actual input image size andused for experimentation The input image resolutions usedin all the saliency algorithms for experimentation are givenin Table 1

42 Human Eye Fixation Prediction In order to validatethe proposed model with human eye fixation predictionssaliency maps are generated on three datasets and for a faircomparison shuffle area under the curve score(sAUC) is usedto quantify the results

Dataset A reasonable dataset that can be used for evaluationof human eye fixation prediction must be complex anddiverse enough so that performance can be thoroughly eval-uated In literature Toronto [16] and Kootstra [34] datasetsare the most popular and widely used datasets IMSAL [2] isa relatively new dataset which we also used in our evaluation

Toronto dataset was prepared by Bruce and Tsotsos [16]and it consists of 120 images each with 681 times 511 pixels Thisdataset has both indoor and outdoor imagesThe eye fixationground truth is based on 20 subjects who free viewed theimages for few seconds

Kootstra dataset was used in [34] It consists of 101 imageseach with a resolution of 1024 times 768 pixels These imagesconsist of flowers natural scenes automans and buildingsThis dataset is significantly complex because it has manyimages with no explicit salient regions The eye fixationground truth available with this dataset is based on freeviewing of 31 subjects for a few seconds

IMSAL dataset is given by Li et al [2] it consists of 235images which are collected online through an internet searchengine and some images were taken from literature Theseimages are divided into six categories with 50 images havinglarge salient regions 80 images with intermediate salientregions 60 images with small salient regions 15 imageswith cluttered backgrounds and 15 images with repeatingdistracters These images give a good benchmark for per-formance evaluation because of the significant complexitygiven by variable size of salient objects objects with clutterand objects with distracters The accompanied ground truthconsists of both eye fixation information and binary maskscreated by human subjects who manually marked the salientobject in an image

Metric for Evaluation The most popular method to evaluatethe performance of a saliency map is to calculate area underthe curve (AUC) score of receiver operating characteristics(ROC) curve At first a saliency map is thresholded andused as a binary classifier with human eye fixations actingas positive set and some other points uniformly random asnegative set to plot an ROC curve The AUC of that ROC iscalculated andused as ameasure of performance comparison

There are various variants of AUC available in literatureand the basic difference between them is the choice ofnegative set points We will use the shuffled area under thecurve sAUC score because of its ability to cater for centerbias [41] since some models implicitly incorporate centerbias which makes a fair comparison difficult to performit is becoming standard to present results with sAUC InsAUC score the positive points consists of human subjectseye fixation on that image and the negative set consists ofall the fixation of subjects on the rest of the dataset imagesThe sAUC gives a 05 score on a center Gaussian blob whichis about the same as a random or chance score whereas allthe other versions of AUC [6] give very high score becausethey are affected by the center bias For our experimentationwe used sAUC available online by Schauerte and Stiefelhagen[39] We calculate every sAUC score for 20 times [47]and then use the mean value We found that the standarddeviation of the sAUC approximately ranges from 1119864 minus 4 to5119864 minus 4 in our experiments

Performance Analysis with Resolution In order to find theoptimal parameters for the proposed model we treat bothrepresentations separately and find the best parameters foreach representation that can be incorporated in the proposedmodel We plotted both sparse and nonlinear representationwith variable resolution on all three datasets and measurethe sAUC score Using various parameters given in Table 2Figure 5 is plotted which gives the performance of bothrepresentations on the three datasets Figure 5 shows that for

8 The Scientific World Journal

Table 2 Various scales parameters which are used for evaluation of sparse and nonlinear representation

Scale Adaptive sparse representation Nonlinear representationResolution Patch size Resolution Patch size

1 40 times 30 5 128 times 128 2 4 8 16 32

2 80 times 60 5 256 times 256 4 8 16 32 64

3 160 times 120 5 512 times 512 8 16 32 64 128

08

07

06

1 2 3

Scale

sAU

C

Sparse representation

BruceKootstra

IMSAL

(a)

Non-linear representation

075

070

065

0601 2 3

Scale

sAU

C

BruceKootstra

IMSAL

(b)

Figure 5 sAUC score for sparse and nonlinear representation The values in (a) and (b) are maximum at scale 2 and scale 3 for all threedatasets

sparse representation the performance is maximum with 80times 60 pixels (scale 2) and for nonlinear representation we getgood performance at 512 times 512 pixels (scale 3) Usually theresolution of the input images is not very high so we restrictto 512 times 512 as upper bound for nonlinear representation andsame resolution with respective patch sizes is used in [17]The image resolution and patch sizes of different scales forthe both models used for evaluation are given in Table 2Based on this analysis we incorporate the parameters of scale2 and scale 3 for sparse and nonlinear representation in theproposed saliency model

Performance Comparison with Other Models The results ofour model along with comparison with 13 state-of-the-artmethods are given in Table 3 The detailed performancewith variable Gaussian smoothing is given in Figure 6 Thesimulation codes for all these methods are taken from theauthors websites We used multiscale and quaternion basedimplementation for spectral residue (MESR) PQFT [4] andDCT [49] as proposed by Schauerte and Stiefelhagen [39]which gives higher score than original methods Erdemrsquos [17]implementation with first order statistics embedded is usedfor simulation since it gives higher sAUC score Results withthe proposed technique are quite consistent and the proposedmethod outperforms the state-of-the-artΔQDCT [39]model

on the Toronto dataset and performs comparatively well onthe Kootstra dataset and IMSAL dataset No single modelperforms well on all these datasets and the performanceof other models changes with the dataset but our modelshows consistency and remained either ranked 1 or ranked2 on these datasets We believe that the high eye fixationprediction is due to the adaptive nature of the model anddue to dual representation of features The adaptivity makesa feature space optimal for the current image thus a moreaccurate representation of the features is possible which inturn accounts for better saliency map estimation Moreovera single representation may not be enough for every caseFinally these results can improve further if we use a multi-scale representation for ICA based representation which weskipped due to computational time constraints

43 Salient Object Detection A saliency map can be usedto detect a salient object in an image The basic premiseis that if an image consists of an object which stands outfrom the rest of the image then it should be identifiedby a saliency algorithm There is a different branch invisual saliency modeling which consists of models that arespecifically designed to detect salient objects These modelsfind the salient object in an image and then segment thewhole

The Scientific World Journal 9

000 002 004 006 008

120590

072

070

068

066

064

062

060

Toronto dataset

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFTLGERDEMQDCTAWSProposed

sAU

C

010 012 014

(a)

062

060

058

056

054

052

Kootstra dataset

000 002 004 006 008

120590

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFTLGERDEMQDCTAWSProposed

sAU

C

010 012 014

(b)

078

076

074

072

070

068

066

IMSAL dataset

000 002 004 006 008

120590

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFT

ERDEMQDCTAWSProposed

sAU

C

LG

010 012 014 016

(c)

Figure 6 Toronto Kootstra and IMSAL datasets with variable Gaussian smoothing for all algorithms in comparison The 119909-axis representsthe 120590 of the smoothing Gaussian (in image width) (In (c) only GBVS LG and ΔQDCT are taken from [48])

10 The Scientific World Journal

Table3Com

paris

onof

sAUCscoreo

fthe

prop

osed

mod

elwith

13otherstate-of-the-artmod

els

Dataset

AIM

[16]

GBV

S[37]

MES

R[39]

ICL[46]

Itti[37]

MPQ

FT[39]

SDSR

[26]

SUN[47]

HFT

[2]

LG[29]

ERDM

[17]

ΔQDCT

[39]

AWS[23]

Prop

osed

TORO

NTO

[16]

0699

064

0407141

064

76064

8107119

07062

066

6106915

06939

07023

07172

07130

07179

Optim

al120590

004

001

005

003

001

004

003

003

006

004

004

005

001

004

KOOTS

TRA[34]

05901

05572

05978

05769

05689

05985

05984

05613

05911

05938

0603

05998

06185

06157

Optim

al120590

003

001

004

001

004

005

004

002

005

001

003

003

001

003

IMSA

L[2]

07424

lowast076

6507153

07364

07512

071694

07403

06955

07419lowast07356

07455

lowast07434

07468

07560

Optim

al120590

008

007

015

013

009

014

013

016

009

012

012

010

005

009

Thetop

rank

edmod

elisin

boldfont

andsecond

rank

edmod

elisin

italic

fontH

erer

esultswith

optim

alaverages

AUCof

each

metho

dwith

thec

orrespon

ding

Gaussianblur(120590)are

repo

rtedA

sin[47]w

erepeat

thes

AUCcalculationfor2

0tim

esandcompu

tethes

tand

arddeviationof

each

averages

AUC

which

ranges

from1119864minus4to5119864minus4(lowastvalues

takenfro

m[48])

The Scientific World Journal 11

Table4Com

paris

onof

AUCandPo

DSC

scoreo

fthe

prop

osed

mod

elwith

otherstate-of-the-artmod

els

Algorith

mC1

C2C3

C4C5

C6Av

gAU

CPo

DSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AvgAU

CAIM

0921

0653

0918

0543

0957

0461

0904

0435

0924

0508

0941

0671

0927

MES

R0798

0494

0854

0436

0927

0377

0718

0238

0837

0398

0918

0617

0842

ICL

0928

0681

0907

0501

0909

0348

0926

0537

0902

0498

0914

0569

0914

Itti

0939

0687

0924

0542

0952

0457

0882

0423

0924

0499

0941

0631

0927

MPQ

FT0805

0497

0865

0452

0927

0386

0735

0248

0852

0436

0921

0630

0851

SDSR

0879

0618

0908

0511

0948

0435

0801

0298

0904

0489

0933

064

50896

SUN

0766

0459

0787

0349

0880

0362

0708

0263

0719

0283

0817

0451

0780

HFT

0937

070

00923

0552

0937

044

8096

4064

80933

0554

096

0070

60942

LG0814

0515

0859

046

40904

0379

0635

0145

0862

0499

0884

0572

0826

ERDEM

0882

0617

0912

0555

0962

0526

0840

0349

0923

0575

0916

0635

0906

MQDCT

0840

0564

0894

0517

0947

0489

0806

0311

0888

0524

0923

0670

0883

AWS

0894

0625

0922

0569

0957

0480

0891

0402

0920

0534

0925

0655

0918

Prop

osed

0920

0653

094

3059

60976

0555

0934

0516

094

20532

0956

0688

094

5Th

etop

rank

edmod

elisin

bold

font

andthe2

ndrank

edmod

elisin

italic

fontTh

eoverallbestaverageA

UCscoreisg

iven

byou

rpropo

sedmod

el

12 The Scientific World Journal

Input Proposed AIM GBVS MESR ICL Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 7 Response of our algorithm and other 13 state-of-the-art algorithms on various psychological patterns

extent of the object and thus solve this task like segmentation-type binary labeling formulation [50 51] In contrast ourmodel is designed for location based (eye fixation) saliencymodeling and it is not designed to capture exact objectboundaries however by thresholding a saliency map we canget a binary map that can be used for testing performance ofa model for salient object detection Since our saliencymodelis location based we only use other location based modelsin comparison for a fair evaluation similar convention isfollowed in [2 17]

Dataset and Metric for Evaluation For salient object detec-tion the metric used by Li et al [2] is area under thecurve (AUC) and dice similarity coefficient (DSC) on IMSAL[2] dataset We will use the same metric and dataset forsalient object detectionThe DSC gives the overlap between athreshold saliency map and the ground truth Moreover thepeak value of the DSC [52] is considered an important way toestablish the best performance of an algorithm at an optimalthreshold thus we also give results with peak value of theDSC curve (PoDSC) Since AUC can be influenced by centerbias for fair comparison we turn off the center bias in allthe algorithmsThe GBVS [37] has built-in center bias and inorder to cater for that the author in [2] incorporates explicitcenter bias and shows that HFT [2] performs better thanGBVS on the same dataset which is also used in our paperWe do not employ center bias explicitly or implicitly in thepresented results and addedHFT instead for our comparisonTherefore GVBS [37] is skipped from Table 4 Furthermorewe perform Gaussian smoothing in all the algorithms tofind the optimal smoothing parameters for each class in thedataset and the optimal performance is quoted in the resultsgiven in Table 4

Performance We present results in comparison with other 12state-of-the-art algorithms The complete results are given in

Table 4 Our proposed scheme gives the best performanceon three categories C2 C3 and C5 and ranked secondon C4 and C6 Our model gives the highest average AUCscore on this dataset On different categories our results arecomparative to the HFT which is state of the art on thisdataset Apart fromHFT the performance is also significantlybetter in comparison to other algorithmsThedataset used forcomparison is quite complex but our algorithm performedwell for intermediate small objects with distracters althoughperformance on other cases is little less than other state-of-the-art algorithms

44 Psychological Patterns Wealso tested our saliencymodelon psychological patterns which are commonly used to givea qualitative performance on artificial scenarios which simu-late a pop-up phenomenonThese patterns simulate the pop-out phenomenon based on color intersection symmetryorientation curvature and candles image In order to checkthe general performance on various psychological tasks wetested the proposed model on eight psychological patternsFigure 7 gives the results of the proposed saliency modelalong with other popular models The proposed algorithmworks well on color symmetry and orientation as well as oncandle image but the performance is not good for curvatureand intersection patterns Figure 7 also shows that any singlemodel does not give good performance on all the patterns andthe best performance is more or less the same as given by theproposed scheme

5 Discussion

The image features representation drastically affects theinformation content and thus saliency estimation We useda bioinspired center surround saliency computation on twoparallel feature representations that give good performance

The Scientific World Journal 13

(a)

(b) (c)

(d) (e)

Figure 8 The fixed learned dictionary and the information lost by using such dictionary (a) Fixed learned dictionary ((b)-(c)) The inputimages ((d)-(e)) The information lost by using dictionary given in (a)

on both eye fixation prediction and salient object detectionSince a CSD operation depends on the difference between acenter portion and its surroundings a better image contentsrepresentation makes a more accurate and precise centersurround operation

We used an adaptive sparse representation to boost theperformance of the CSD operation In order to show theeffectiveness of the proposed approach we present bothquantitative and qualitative results For qualitative compar-ison we use a fix dictionary [29] (Figure 8(a)) learnt from

an ensemble of natural images We show that some of theinformation is lost if we use a fixed dictionary to represent animage and usually the lost information belongs to the salientregion in an image The difference between an input imageand the image reconstructed by a fixed dictionary is givenin Figure 8 The red cylinder Figure 8(b) and red box withtext Figure 8(c) is visible in the image plots Figures 8(d)and 8(e) These two objects are the salient features in bothimages and their appearance in the residual image shows thatcurrent representation which uses a fixed dictionary loses

14 The Scientific World Journal

Input Gnd truth Proposed AIM GBVS Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 9 Comparison of our method with other state-of-the-art saliency models The first column is the input image with second columnbeing ground truth followed by our proposed saliency model Results from all the other models are plotted in the same row

Table 5 sAUC score comparison using fixed [29] and adaptivesparse representation (proposed) for CSD saliency

DatasetBorji local [29] Fixeddictionary sparserepresentation

Proposed Adaptivedictionary sparserepresentation

Toronto 0670 07110Kootstra 0572 06061

Table 6 sAUC comparison of E Erdem and A Erdem [17] and theproposed color based nonlinear integration

Dataset E Erdem andA Erdem [17]

Proposed Nonlinearrepresentation

Bruce and Tsotsos [16] 07023 07084Kootstra et al [34] 0603 06152

some information that belong to salient portion of the inputimage

For quantitative comparison we employ sAUC to dothe comparison between saliency maps based on a fixeddictionary and the adaptive basis dictionary Table 5 gives acomparison of the both approaches on two datasets Torontoand Kootstra The performance difference is quite significantand it shows that an adaptive representation is much betterBased on these qualitative and quantitative results we canconclude that an adaptive image presentation is more viableand accurate for CSD based saliency computation

A model proposed in [17] gives an idea of nonlinearintegration of all features by covariance matrices and thesupported implementation uses color gradient and spatialinformation Our second contribution is the modification ofthat model and a proposal of using only color with spatial

information in nonlinearly integrated manner using samecovariance matrices For our proposed implementation(seeFigure 4) we modified the features used in [17] and comparethe saliencymap with Erdemrsquos [17] model in Table 6 using thesAUC score Two databases Toronto [16] and Kootstra [34]are used for simulations and the results indicate that by usingonly color with spatial information we can get better sAUCscore than integrating all features using covariance matricesIn Table 6 the difference in sAUC score is quite visible onboth datasets One possible reason of this improvement maybe that the correlation among different features like color andorientation is different and thus using a covariance basedrepresentation does not capture the underlying informationstructure in an efficient way as compared to when only colorinformation is used

One possible argument against our usage of only colorinformation however can be that without any gradient ororientation information a saliency model will fail to detectmany salient regions This argument can also supported bynature since neurons tuned to orientation in an image areknown to contribute to saliency computation [53] In ourmodel this issue is addressed by the sparse representationwhere the adaptive basis same as basis shown in Figure 8(a)is Gabor like filters with edge like structure and thus thesebases efficiently capture orientation information from theimage which complements our color information in nonlin-ear representation

Finally the sAUC score of dual representation Table 3shows that we achieve better eye fixation prediction thantreating both representations separately as shown in Tables5 and 6We believe that such improvement is due to the com-plementary behavior of both techniques since a combinedapproach better represents image contents with high fidelityand thus that in turn improves saliency detection Lastly for

The Scientific World Journal 15

illustration and visual comparison we present some saliencymaps produced by our algorithm along with other models inFigure 9

6 Conclusion

This paper shows that dual feature representationmanages torobustly capture image information which can be used in acenter surround operation to compute saliencyWe show thata CSD on adaptive sparse basis gives better results than a fixsparse basis representation In nonlinear representation weshow that nonlinearly integrated color channels with spatialinformation better capture underlying data structure andthus a CSD on such representation gives good results Finallywe consider both representations as complementary and thusa fused saliencymap not only give good results on human eyefixations but also detect salient objects with high accuracy

In future wewill incorporate some top-downmechanismto better imitative human saliency computation capabilitiesbased on learning and experience Another possible exten-sion of the existing work is to test dynamic scenes video byincorporating additional motion information in the currentscheme

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the national Natural Sci-ence Foundation of China (61175096) and Specialized Fundfor Joint Building Program of Beijing Municipal EducationCommission

References

[1] C Siagian and L Itti ldquoBiologically inspired mobile robot visionlocalizationrdquo IEEE Transactions on Robotics vol 25 no 4 pp861ndash873 2009

[2] J Li M D Levine X An X Xu and H He ldquoVisual saliencybased on scale-space analysis in the frequency domainrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 4 pp 996ndash1010 2012

[3] Y Su Q Zhao L Zhao and D Gu ldquoAbrupt motion trackingusing a visual saliency embedded particle filterrdquo Pattern Recog-nition vol 47 no 5 pp 1826ndash1834 2014

[4] C Guo Q Ma and L Zhang ldquoSpatio-temporal saliency detec-tion using phase spectrum of quaternion fourier transformrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo08) pp 1ndash8 June 2008

[5] X Hou and L Zhang ldquoThumbnail generation based on globalsaliencyrdquo in Advances in Cognitive NeurodynamicsmdashICCN2007 pp 999ndash1003 Springer Amsterdam The Netherlands2007

[6] A Borji and L Itti ldquoState-of-the-art in visual attention mod-elingrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 35 no 1 pp 185ndash207 2013

[7] L Itti C Koch and E Niebur ldquoAmodel of saliency-based visualattention for rapid scene analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 20 no 11 pp 1254ndash12591998

[8] AM Treisman ldquoFeature integration theoryrdquoCognitive Psychol-ogy vol 12 no 1 pp 97ndash136 1980

[9] J K Tsotsos S M Culhane W Y Kei Wai Y Lai N Davisand F Nuflo ldquoModeling visual attention via selective tuningrdquoArtificial Intelligence vol 78 no 1-2 pp 507ndash545 1995

[10] R Rae Gestikbasierte Mensch-Maschine-Kommunikation aufder grundlage visueller aufmerksamkeit und adaptivitat [PhDthesis] Universitat Bielefeld 2000

[11] D Parkhurst K Law and E Niebur ldquoModeling the role ofsalience in the allocation of overt visual attentionrdquo VisionResearch vol 42 no 1 pp 107ndash123 2002

[12] J Li Y Tian T Huang and W Gao ldquoProbabilistic multi-tasklearning for visual saliency estimation in videordquo InternationalJournal of Computer Vision vol 90 no 2 pp 150ndash165 2010

[13] MCerf J HarelW Einhauser andCKoch ldquoPredicting humangaze using low-level saliency combined with face detectionrdquo inAdvances in Neural Information Processing Systems vol 20 pp241ndash248 2007

[14] A Torralba ldquoModeling global scene factors in attentionrdquoJournal of the Optical Society of America A Optics and ImageScience and Vision vol 20 no 7 pp 1407ndash1418 2003

[15] A Oliva and A Torralba ldquoModeling the shape of the scenea holistic representation of the spatial enveloperdquo InternationalJournal of Computer Vision vol 42 no 3 pp 145ndash175 2001

[16] N D B Bruce and J K Tsotsos ldquoSaliency attention and visualsearch an information theoretic approachrdquo Journal of Visionvol 9 no 3 article 5 2009

[17] E Erdem and A Erdem ldquoVisual saliency estimation by nonlin-early integrating features using region covariancesrdquo Journal ofVision vol 13 no 4 article 11 2013

[18] J H van Hateren ldquoReal and optimal neural images in earlyvisionrdquo Nature vol 360 no 6399 pp 68ndash70 1992

[19] D J Field ldquoRelations between the statistics of natural imagesand the response properties of cortical cellsrdquo Journal of theOptical Society of America A Optics and Image Science vol 4no 12 pp 2379ndash2394 1987

[20] B A Olshausen and D J Field ldquoNatural image statistics andefficient codingrdquo Network Computation in Neural Systems vol7 no 2 pp 333ndash339 1996

[21] M Kwon G Legge F Fang A Cheong and S He ldquoIdentifyingthe mechanism of adaptation to prolonged contrast reductionrdquoJournal of Vision vol 9 no 8 p 976 2009

[22] X Sun H Yao and R Ji ldquoVisual attention modeling based onshort-term environmental adaptionrdquo Journal of Visual Commu-nication and Image Representation vol 24 no 2 pp 171ndash1802013

[23] A Garcia-Diaz X R Fdez-Vidal X M Pardo and R DosilldquoSaliency from hierarchical adaptation through decorrelationand variance normalizationrdquo Image and Vision Computing vol30 no 1 pp 51ndash64 2012

[24] C E Shannon ldquoAmathematical theory of communicationrdquo BellSystem Technical Journal vol 27 no 3 pp 379ndash423 1948

[25] Y Karklin and M S Lewicki ldquoEmergence of complex cellproperties by learning to generalize in natural scenesrdquo Naturevol 457 no 7225 pp 83ndash86 2009

[26] H J Seo and P Milanfar ldquoStatic and space-time visual saliencydetection by self-resemblancerdquo Journal of Vision vol 9 no 12pp 1ndash27 2009

16 The Scientific World Journal

[27] D J Jobson Z-U Rahman and G AWoodell ldquoProperties andperformance of a centersurround retinexrdquo IEEE Transactionson Image Processing vol 6 no 3 pp 451ndash462 1997

[28] A Guo D Zhao S Liu X Fan and W Gao ldquoVisual attentionbased image quality assessmentrdquo in Proceedings of the 18thIEEE International Conference on Image Processing (ICIP rsquo11) pp3297ndash3300 September 2011

[29] A Borji and L Itti ldquoExploiting local and global patch raritiesfor saliency detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo12) pp 478ndash485 2012

[30] L Duan C Wu J Miao L Qing and Y Fu ldquoVisual saliencydetection by spatially weighted dissimilarityrdquo in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR rsquo11) pp 473ndash480 June 2011

[31] JMWolfe ldquoGuided search 20 a revisedmodel of visual searchrdquoPsychonomic Bulletin amp Review vol 1 no 2 pp 202ndash238 1994

[32] O Le Meur P Le Callet D Barba and DThoreau ldquoA coherentcomputational approach to model bottom-up visual attentionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 28 no 5 pp 802ndash817 2006

[33] A Torralba A Oliva M S Castelhano and J M HendersonldquoContextual guidance of eye movements and attention in real-world scenes the role of global features in object searchrdquoPsychological Review vol 113 no 4 pp 766ndash786 2006

[34] G Kootstra N Bergstrom and D Kragic ldquoUsing symmetry toselect fixation points for segmentationrdquo in Proceedings of the20th International Conference on Pattern Recognition (ICPR rsquo10)pp 3894ndash3897 August 2010

[35] C Li J Xue N Zheng X Lan and Z Tian ldquoSpatio-temporalsaliency perception via hypercomplex frequency spectral con-trastrdquo Sensors vol 13 no 3 pp 3409ndash3431 2013

[36] L Itti and P Baldi ldquoBayesian surprise attracts human attentionrdquoVision Research vol 49 no 10 pp 1295ndash1306 2009

[37] J Harel C Koch and P Perona ldquoGraph-based visual saliencyrdquoin Advances in Neural Information Processing Systems vol 19pp 545ndash552 2006

[38] X Hou and L Zhang ldquoSaliency detection a spectral residualapproachrdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo07)pp 1ndash8 June 2007

[39] B Schauerte and R Stiefelhagen ldquoQuaternion-based spectralsaliency detection for eye fixation predictionrdquo in ComputerVisionmdashECCV 2012 A Fitzgibbon S Lazebnik P Perona YSato and C Schmid Eds Lecture Notes in Computer Sciencepp 116ndash129 2012

[40] W Kienzle F Wichmann B Scholkopf and M Franz ANonparametric Approach to Bottom-Up Visual Saliency 2007

[41] J Tilke K Ehinger F Durand and A Torralba ldquoLearningto predict where humans lookrdquo in Proceedings of the 12thInternational Conference on Computer Vision (ICCV rsquo09) pp2106ndash2113 October 2009

[42] W Wang Y Wang Q Huang and W Gao ldquoMeasuring visualsaliency by site entropy raterdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 2368ndash2375 June 2010

[43] O Tuzel F Porikli and P Meer ldquoRegion covariance afast descriptor for detection and classificationrdquo in ComputerVisionmdashECCV 2006 vol 3952 of Lecture Notes in ComputerScience pp 589ndash600 2006

[44] httpssitesgooglecomsitesparsenonlinearsaliencymodelhomedownloads

[45] A Hyvarinen and E Oja ldquoA fast fixed-point algorithm forindependent component analysisrdquo Neural Computation vol 9no 7 pp 1483ndash1492 1997

[46] X Hou and L Zhang ldquoDynamic visual attention searching forcoding length incrementsrdquo in Advances in Neural InformationProcessing Systems vol 21 pp 681ndash688 2008

[47] L Zhang M H Tong T K Marks H Shan and G WCottrell ldquoSUN a Bayesian framework for saliency using naturalstatisticsrdquo Journal of Vision vol 8 no 7 article 32 pp 1ndash202008

[48] J Zhang and S Sclaroff ldquoSaliency detection a boolean mapapproachrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV rsquo13) 2013

[49] X Hou J Harel and C Koch ldquoImage signature highlightingsparse salient regionsrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 34 no 1 pp 194ndash201 2012

[50] E Rahtu J Kannala M Salo and J Heikkila ldquoSegmentingsalient objects from images and videosrdquo in Computer VisionmdashECCV 2010 vol 6315 of Lecture Notes in Computer Science pp366ndash379 2010

[51] M-M Cheng G-X Zhang N J Mitra X Huang and S-M Hu ldquoGlobal contrast based salient region detectionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo11) pp 409ndash416 June 2011

[52] T Veit J-P Tarel P Nicolle and P Charbonnier ldquoEvaluationof road marking feature extractionrdquo in Proceedings of the11th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo08) pp 174ndash181 December 2008

[53] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model of simple receptive fields in catstriate cortexrdquo Journal of Neurophysiology vol 58 no 6 pp1233ndash1258 1987

Page 3: Saliency Detection Using Sparse and Nonlinear …...Saliency Detection Using Sparse and Nonlinear Feature Representation ShahzadAnwar,1 QingjieZhao,1 MuhammadFarhanManzoor,2 andSaqibIshaqKhan3

The Scientific World Journal 3

Surround

Center

Figure 1 A depiction of center surround difference where thepatch in consideration is in red rectangle and the patches inyellow rectangular region are surrounding area from which thedissimilarity is checked

knowledge this is the first time that a CSD mechanism isusedwith an adaptive sparse image representationMoreoverthe proposed scheme of using only color information in non-linear form remarkably improve results Both saliency mapsare fused at a later stage to form a net saliency map whichrepresents salient features better than saliency maps from thetwo independent representations

This paper is organized as follows Related work is givenin Section 2 Section 3 covers proposed model and the math-ematical formulation of both representations and saliencycomputation Section 4 covers experimentation and resultssection A detailed discussion about the contribution in bothrepresentations along with certain necessary comparisons isgiven in Section 5 followed by conclusion in Section 6

2 Related Work

Initial visual saliency models take inspiration from featureintegration theory (FIT) [8] and guided search models [31]The first practical implementation of a saliency model basedon these theories is presented by Itti et al [7] where a numberof contrast features are learnt in parallel and fused together tomake a topographic map Later Le Meur et al [32] presentedanother cognitive model based on a contrast sensitivityfunction along with the incorporation of a CSD mechanismand perceptional decomposition There are numerous othermodels proposed which utilize different bioinspired featuresand mechanisms for saliency detection like GIST [33] PCA[30] ICA [16] histogram of local orientations symmetry[34] depth entropy [16] texture and motion [35]

Apart from cognitive models different probability basedmodels are also presented These models incorporate imagestatistics and learn probability distributions using currentimage or ensemble of images for saliency estimation Itti andBaldi [36] defined a Bayesian surprise usingKullback-Leibler(KL) distance for posterior and prior beliefs Torralba et al[33] used contextual evidence to further consolidate low levelsaliency Harel et al [37] approached the saliency problemusing the probabilistic graphical models Hou and Zhang[38] used a global approach based on Fourier transformto find saliency He proposed that the residue of original

and smoothed amplitude spectrum of a Fourier transformcontains information about salient region in an image Laterit was shown [4] that the phase of a Fourier transformcontains the essential location information rather than theamplitude There are several other models which use Fouriertransform and are classified as frequency based models [2]In these models respective color channels are individuallyprocessed for saliency detection and finally all maps arefused in a single saliency map In contrast a quaternionapproach is proposed [39] to use a unified frame work toprocess all color channels The quaternion framework alsoallows incorporation of a fourth channel like motion in avery elegant manner Some models use learning techniquesthat incorporate human eye fixation in their saliency modelsKienzle et al [40] used Human eye fixations for derivinga learned model while Tilke et al [41] trained supportvector machines (SVM) on image patches with low inter-mediate and high level features to compute saliency Apartfrom above-mentioned approaches different new techniqueslike redundancy rectangular window SNR and regressionhave shown remarkable results in saliency modeling Ina different approach Wang et al [42] proposed site rateentropy for computing saliency using framework of graphicalmodels

Some approaches use information theoretic frameworksto model saliency like Bruce and Tsotsos [16] gives theidea of information maximization (AIM) Using a biologicalstructure [20] of sparse representation and Shannonrsquos [24]formulation of self-information inverse probability of a patchin the entire image Bruce andTsotsos [16] computed saliencyThis self-information can be considered as a global measureof saliency There are various extensions of using sparserepresentations of images using a learned dictionary forsaliency computation Recently Sun et al [22] proposed thatsince biological systems are adaptive an adaptive dictionaryis more representative and thus he used principle of self-information for saliency computation using adaptive basisAWS [23] also used adaption and it works on the principlethat a statistical distance in a representative space givessaliency This representative space is computed by whiteningthe basis to the structure of a particular image This schemeuses multiscale and multistage operations on features anduses an efficient way to overcome the computation complex-ity in whitening

In [29] Borji and Itti proposed that local and globalmeasures are complementary and used both center surroundand self-information for saliency computation Moreoverthey showed that multiple color spaces are useful in bettersaliency estimation There are some saliency models [17 26]which rely on nonlinear representation of features and on theintegration of various features and channels In [26] gradientfeatures are used in a nonlinear representation based onlocal steerable kernels (LSK) for image representation whilethe author in [17] proposes a nonlinear integration usingcovariance matrices This paper also incorporates first orderimage statistics in covariance matrices to better estimatesaliency Moreover [17 26] also solve various features andrespective channels integration issue by putting forth a singleunified form

4 The Scientific World Journal

Our saliency model is inspired from mainly two typesof models sparse [16 20] and nonlinear representation [27]We propose a novel dual approach based on both sparseand nonlinear feature representation Inspired by biologicalevidence of neural receptive field properties [20] that effi-ciently process natural images in a sparse manner we use asparse image representation Moreover in order to representadaptivity of neurons to better tackle a new environment weuse an adaptive basis dictionary in an ICA approach [22]Thus our proposed method simultaneously uses sparsity andadaptivity In literature the model similar to our adaptivesparse representation is [22] In [22] Sun et al used aninformation theoretic global approach for saliency computa-tion but we use a more bioplausible local CSD for saliencycomputation Secondly we propose nonlinearly integratedrepresentation of single feature channels color along withspatial information for saliency computation Our approachis a modification of the model proposed in [17] whereall features and channels are nonlinearly integrated usingcovariance matrices although we propose that only colorinformation is enough and it can better estimate saliency inour framework Here also a CSD approach is used for saliencycomputation Finally a combined saliency map is formed byfusing the outputs given by the two representations

Contributions Major contributions of this work can besummarized as follows

(1) A novel dual image feature representation simultane-ous sparse and nonlinear feature representation

(2) CSD based saliency computation in adaptive sparserepresentation

(3) Only color based nonlinear integrated covariancerepresentation followed by CSD computation

(4) Improved results with comparison to other states ofthe models that are established by extensive testingon popular eye fixation prediction datasets and on asalient object detection dataset

3 Proposed Model

Our proposed scheme is given in Figure 2 An input imageis being simultaneously represented in sparse and nonlinearform Then saliency is computed by local center surroundoperation and finally both maps are combined to form asingle saliency map

For sparse representation we break an image into patchesand perform independent component analysis to derive basisLater these basis with sparse coefficient are used to representthe image Furthermore again after converting image intopatches we only take color information and integrate allchannels in a nonlinear fashion using covariance matricesalong with spatial information to represent an image patch

31 Mathematical Modeling In this section we cover math-ematical formulation of sparse representation nonlinearrepresentation and saliency computation Some necessarydiscussion is included to elaborate few concepts and also

some references will be given to avoid unnecessary formu-lation of the well-known concepts

Sparse Representation with ICAWe use an ICA based imagerepresentation thus an input image 119868 can be sparselyrepresented as

1198681015840

= 119860s (1)

where 119860 is a dictionary consisting of a set of basis and s con-sists of respective coefficients In our case we learn 119860 fromevery input image thus we adapt dictionary for every inputstimulusThis approach results inminimum information losswhich is a basic drawback of a fixed dictionary learned froman ensemble of images The sparse coefficients are learned byprojecting an input image to the basis such as

s = 119882119868 (2)

where

119882 = 119860minus1

(3)

The basis have dimensions the same as the patches formedfrom the input image Finally 119868

1015840 in patches form can berepresented as

1198681015840

=

1198991015840

K119896=1

1198601199011015840

119896 (4)

where 1199011015840

119896is the 119896th patchrsquos sparse coefficient vector consisting

of 1198981015840 co-efficients and there are total 119899

1015840 number of patchesin 1198681015840 Moreover K represents a function that reshape and

arranges patches at respective position to form an imageFigure 3 gives the depiction of the whole process

Nonlinear Representation with Covariance Matrices Our fea-ture matrix 119865 is based on raw RGB color space values of 119868

along with pixel position information

119865 = [119868119877 119868119866 119868119861 119909 119910] (5)

where every pixel in 119865 is a 5-dimensional vector 119891119894119894=12119901

where 119901 is the total number of pixels in the image Ourfeatures are different than those used in E Erdem andA Erdem [17] since we do not incorporate any gradientinformation in our feature matrix 119865 In (5) color along withspatial information is used rather than E Erdem and AErdem [17] approach of making a features matrix consistingof all features

The next step is nonlinear representation of 119865 usingcovariancematrices along with first order statistics [17] Tuzelet al [43] gave the concept of encoding a patch by a covariancematrix Later it was used in many applications In saliencydomain E Erdem and A Erdem [17] used patch covariancewith first order statistics for image feature representation andwe will dwell on his approach for our case Thus calculatinglocal covariance matrices for an image patch 119901

119896 we get

119862 (119901119896) =

1

119898 minus 1

119898

sum

119894=1

(119891119894minus 120583119896) (119891119894minus 120583119896)119879 (6)

The Scientific World Journal 5

Adaptive sparserepresentation

Center surroundsaliency

computation

Center surroundsaliency

computation

Image to patchesconversion

Saliency map

Color covariancenon-linear

representation

Image to patchesconversion

Π

Figure 2 Proposed model for saliency computation

Center surroundsaliency

computation

Image patches Sparse co-efficientvectors

Adaptive basis

p998400i p

998400i+1 p998400

i+2

Figure 3 Saliency computation in adaptive sparse representation

where a patch 119901119896consists of 119898 pixels with mean 120583

119896 Now

first order statistics is incorporated in the covariancematricesusing the method mentioned in [17] Then the new repre-sentation of covariance matrices with first order statisticsembedded for a patch is given by

120595119896

= 120595 (119862 (119901119896)) (7)

where function 120595 embeds first order statistics in an inputmatrix The final nonlinear feature representation of image

with 120595119896represents a 119896th patch and 119899 being total number of

patches is given by

11986810158401015840

=

119899

K119896=1

120595119896 (8)

where also K function arranges patches at respective posi-tions to form an image The whole representation is given inFigure 4Saliency Computation The saliency is computed by CSDoperation and then extended to multiple scales The CSDoperation is shown in Figure 1 where a patch under consider-ation is in red rectangle and surrounding area is highlighted

6 The Scientific World Journal

Input

RG

B

Output

Image to patchesconversion and

covariancerepresentation

First orderstatistics

embeddedPosition

information

I(x y)

Feature matrixF = [R G B x y]

Figure 4 Nonlinear representation of an input image

in yellow rectangle The saliency of red patch 119901119894 is given

by its dissimilarity between 119908 surrounding patches (yellowrectangle) as

119878 (119901119894) =

1

119908

119908

sum

119895=1

119889 (119901119894 119901119895) (9)

where dissimilarity 119889(119901119894 119901119895) between two patches is given by

119889 (119901119894 119901119895) =

10038171003817100381710038171003817120572 (119901119894) minus 120572 (119901

119895)10038171003817100381710038171003817

1 +10038171003817100381710038171003817119883119894minus 119883119895

10038171003817100381710038171003817

(10)

where119883119894and119883

119895are the central position of the patches 119901

119894and

119901119895 For the case of sparse representation we have

120572 (119901119894) = 119901

1015840

119894(11)

and for nonlinear representation we have

120572 (119901119894) = 120595119894= 120595 (119862 (119901

119894)) (12)

Thus the saliencymap for patch 119901119894derived from 119868

1015840 and 11986810158401015840 can

be given as

1198781198681015840 (119901119894) =

1

119908

119908

sum

119895=1

100381710038171003817100381710038171199011015840

119894minus 1199011015840

119895

10038171003817100381710038171003817

1 +10038171003817100381710038171003817119883119894minus 119883119895

10038171003817100381710038171003817

(13)

and with 120595(119862(119901119894)) being in vector form

11987811986810158401015840 (119901119894) =

1

119908

119908

sum

119895=1

10038171003817100381710038171003817120595 (119862 (119901

119894)) minus 120595 (119862 (119901

119895))

10038171003817100381710038171003817

1 +10038171003817100381710038171003817119883119894minus 119883119895

10038171003817100381710038171003817

(14)

The multiscale saliency by sparse approach is given by

1198781198681015840 (119909) = N(prod

1198971015840120598119871

1198781198971015840

1198681015840 (119909)) (15)

and for nonlinear integrated approach is given by

11987811986810158401015840 (119909) = N(prod

119897120598119871

119878119897

11986810158401015840 (119909)) (16)

where 1198971015840 and 119897 represent the number of scales and N shows

normalization Finally saliency map becomes

119878119868

= 119866120590

⊛ (N (1198781198681015840 lowast 11987811986810158401015840)) (17)

where119866120590represents the Gaussian smoothing by convolution

⊛ operation and lowast stands for multiplication operation

4 Experimentation

In this section we thoroughly evaluate the proposed modelwith three different experiments human eye fixation pre-diction salient-object detection and response to variouspsychological patterns The human eye fixation prediction isthe basic and necessary test to check the performance of asaliency map against the collected eye fixation from severalhuman subjects

How well a saliency map distinguishes and highlights anobject in an image shows its ability of salient object detectionThe salient object detection capability of a model is evaluatedby employing some metrics that compare the generatedsaliency map against the ground truth made by manuallabeling of the salient region in an image by human subjectsThe psychological patterns give a qualitative analysis of thesaliency model These patterns are designed to check pop-upresponses in different scenarios like orientation conjunctionand color and so forth Code (Matlab P-code) of the proposedmodel used for experimentation is available online [44]

41 Parameter Setting Before pursuing the evaluation of theproposed model we fix the parameters used to generatethe saliency maps by our model These parameters willremain the same for all the experiments Derivation ofthese parameters will be discussed in the next section afterthe introduction of the datasets and the metric used forevaluation

Sparse Representation We resize all input images to 80 times

60 pixels and use only single scale 1198971015840

= 1 for saliencycomputation Patches of 5 times 5 pixels [22] are generated withsliding overlapping window from every input image to learnthe basis for the dictionary and for the saliency computation

The Scientific World Journal 7

Table 1 Following are the input image resolution used in eachalgorithm with default parameter settings Some algorithms (atleastlowast) do internally reduce dimensions for fast computation andoptimal performance

Model Acronym ResolutionAIM [16] 12(W times H)lowastGBVS [37] W times HMESR [39] 64 times 48lowastICL [46] W times HlowastItti [37] W times HMPQFT [39] 64 times 48AWS [23] W times HlowastSDSR [26] W times HSUN [47] 12(W times H)HFT [2] 128 times 128LG [29] 256 times 256ERDM [17] 512 times 512ΔQDCT [39] 64 times48lowastProposed W times H

An ICA package available online FAST ICA [45] is used forthis experimentation

Nonlinear Representation For nonlinear image representa-tion RGB color and position information is used in onlineavailable implementation of E Erdem andA Erdem [17]Thesaliency is computed with the default parameters used in [17]that have every input image being resized to 512 times 512 pixelsand five different patch sizes 119901

119894= 8 16 32 64 128 and thus

119897 = 5 are used for saliency computationFinally normalized sparse representationrsquos saliency map

is rescaled to the size of nonlinear representationrsquos saliencymap and both maps are multiplied and normalizedThen thefinal saliencymap is resized to the actual input image size andused for experimentation The input image resolutions usedin all the saliency algorithms for experimentation are givenin Table 1

42 Human Eye Fixation Prediction In order to validatethe proposed model with human eye fixation predictionssaliency maps are generated on three datasets and for a faircomparison shuffle area under the curve score(sAUC) is usedto quantify the results

Dataset A reasonable dataset that can be used for evaluationof human eye fixation prediction must be complex anddiverse enough so that performance can be thoroughly eval-uated In literature Toronto [16] and Kootstra [34] datasetsare the most popular and widely used datasets IMSAL [2] isa relatively new dataset which we also used in our evaluation

Toronto dataset was prepared by Bruce and Tsotsos [16]and it consists of 120 images each with 681 times 511 pixels Thisdataset has both indoor and outdoor imagesThe eye fixationground truth is based on 20 subjects who free viewed theimages for few seconds

Kootstra dataset was used in [34] It consists of 101 imageseach with a resolution of 1024 times 768 pixels These imagesconsist of flowers natural scenes automans and buildingsThis dataset is significantly complex because it has manyimages with no explicit salient regions The eye fixationground truth available with this dataset is based on freeviewing of 31 subjects for a few seconds

IMSAL dataset is given by Li et al [2] it consists of 235images which are collected online through an internet searchengine and some images were taken from literature Theseimages are divided into six categories with 50 images havinglarge salient regions 80 images with intermediate salientregions 60 images with small salient regions 15 imageswith cluttered backgrounds and 15 images with repeatingdistracters These images give a good benchmark for per-formance evaluation because of the significant complexitygiven by variable size of salient objects objects with clutterand objects with distracters The accompanied ground truthconsists of both eye fixation information and binary maskscreated by human subjects who manually marked the salientobject in an image

Metric for Evaluation The most popular method to evaluatethe performance of a saliency map is to calculate area underthe curve (AUC) score of receiver operating characteristics(ROC) curve At first a saliency map is thresholded andused as a binary classifier with human eye fixations actingas positive set and some other points uniformly random asnegative set to plot an ROC curve The AUC of that ROC iscalculated andused as ameasure of performance comparison

There are various variants of AUC available in literatureand the basic difference between them is the choice ofnegative set points We will use the shuffled area under thecurve sAUC score because of its ability to cater for centerbias [41] since some models implicitly incorporate centerbias which makes a fair comparison difficult to performit is becoming standard to present results with sAUC InsAUC score the positive points consists of human subjectseye fixation on that image and the negative set consists ofall the fixation of subjects on the rest of the dataset imagesThe sAUC gives a 05 score on a center Gaussian blob whichis about the same as a random or chance score whereas allthe other versions of AUC [6] give very high score becausethey are affected by the center bias For our experimentationwe used sAUC available online by Schauerte and Stiefelhagen[39] We calculate every sAUC score for 20 times [47]and then use the mean value We found that the standarddeviation of the sAUC approximately ranges from 1119864 minus 4 to5119864 minus 4 in our experiments

Performance Analysis with Resolution In order to find theoptimal parameters for the proposed model we treat bothrepresentations separately and find the best parameters foreach representation that can be incorporated in the proposedmodel We plotted both sparse and nonlinear representationwith variable resolution on all three datasets and measurethe sAUC score Using various parameters given in Table 2Figure 5 is plotted which gives the performance of bothrepresentations on the three datasets Figure 5 shows that for

8 The Scientific World Journal

Table 2 Various scales parameters which are used for evaluation of sparse and nonlinear representation

Scale Adaptive sparse representation Nonlinear representationResolution Patch size Resolution Patch size

1 40 times 30 5 128 times 128 2 4 8 16 32

2 80 times 60 5 256 times 256 4 8 16 32 64

3 160 times 120 5 512 times 512 8 16 32 64 128

08

07

06

1 2 3

Scale

sAU

C

Sparse representation

BruceKootstra

IMSAL

(a)

Non-linear representation

075

070

065

0601 2 3

Scale

sAU

C

BruceKootstra

IMSAL

(b)

Figure 5 sAUC score for sparse and nonlinear representation The values in (a) and (b) are maximum at scale 2 and scale 3 for all threedatasets

sparse representation the performance is maximum with 80times 60 pixels (scale 2) and for nonlinear representation we getgood performance at 512 times 512 pixels (scale 3) Usually theresolution of the input images is not very high so we restrictto 512 times 512 as upper bound for nonlinear representation andsame resolution with respective patch sizes is used in [17]The image resolution and patch sizes of different scales forthe both models used for evaluation are given in Table 2Based on this analysis we incorporate the parameters of scale2 and scale 3 for sparse and nonlinear representation in theproposed saliency model

Performance Comparison with Other Models The results ofour model along with comparison with 13 state-of-the-artmethods are given in Table 3 The detailed performancewith variable Gaussian smoothing is given in Figure 6 Thesimulation codes for all these methods are taken from theauthors websites We used multiscale and quaternion basedimplementation for spectral residue (MESR) PQFT [4] andDCT [49] as proposed by Schauerte and Stiefelhagen [39]which gives higher score than original methods Erdemrsquos [17]implementation with first order statistics embedded is usedfor simulation since it gives higher sAUC score Results withthe proposed technique are quite consistent and the proposedmethod outperforms the state-of-the-artΔQDCT [39]model

on the Toronto dataset and performs comparatively well onthe Kootstra dataset and IMSAL dataset No single modelperforms well on all these datasets and the performanceof other models changes with the dataset but our modelshows consistency and remained either ranked 1 or ranked2 on these datasets We believe that the high eye fixationprediction is due to the adaptive nature of the model anddue to dual representation of features The adaptivity makesa feature space optimal for the current image thus a moreaccurate representation of the features is possible which inturn accounts for better saliency map estimation Moreovera single representation may not be enough for every caseFinally these results can improve further if we use a multi-scale representation for ICA based representation which weskipped due to computational time constraints

43 Salient Object Detection A saliency map can be usedto detect a salient object in an image The basic premiseis that if an image consists of an object which stands outfrom the rest of the image then it should be identifiedby a saliency algorithm There is a different branch invisual saliency modeling which consists of models that arespecifically designed to detect salient objects These modelsfind the salient object in an image and then segment thewhole

The Scientific World Journal 9

000 002 004 006 008

120590

072

070

068

066

064

062

060

Toronto dataset

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFTLGERDEMQDCTAWSProposed

sAU

C

010 012 014

(a)

062

060

058

056

054

052

Kootstra dataset

000 002 004 006 008

120590

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFTLGERDEMQDCTAWSProposed

sAU

C

010 012 014

(b)

078

076

074

072

070

068

066

IMSAL dataset

000 002 004 006 008

120590

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFT

ERDEMQDCTAWSProposed

sAU

C

LG

010 012 014 016

(c)

Figure 6 Toronto Kootstra and IMSAL datasets with variable Gaussian smoothing for all algorithms in comparison The 119909-axis representsthe 120590 of the smoothing Gaussian (in image width) (In (c) only GBVS LG and ΔQDCT are taken from [48])

10 The Scientific World Journal

Table3Com

paris

onof

sAUCscoreo

fthe

prop

osed

mod

elwith

13otherstate-of-the-artmod

els

Dataset

AIM

[16]

GBV

S[37]

MES

R[39]

ICL[46]

Itti[37]

MPQ

FT[39]

SDSR

[26]

SUN[47]

HFT

[2]

LG[29]

ERDM

[17]

ΔQDCT

[39]

AWS[23]

Prop

osed

TORO

NTO

[16]

0699

064

0407141

064

76064

8107119

07062

066

6106915

06939

07023

07172

07130

07179

Optim

al120590

004

001

005

003

001

004

003

003

006

004

004

005

001

004

KOOTS

TRA[34]

05901

05572

05978

05769

05689

05985

05984

05613

05911

05938

0603

05998

06185

06157

Optim

al120590

003

001

004

001

004

005

004

002

005

001

003

003

001

003

IMSA

L[2]

07424

lowast076

6507153

07364

07512

071694

07403

06955

07419lowast07356

07455

lowast07434

07468

07560

Optim

al120590

008

007

015

013

009

014

013

016

009

012

012

010

005

009

Thetop

rank

edmod

elisin

boldfont

andsecond

rank

edmod

elisin

italic

fontH

erer

esultswith

optim

alaverages

AUCof

each

metho

dwith

thec

orrespon

ding

Gaussianblur(120590)are

repo

rtedA

sin[47]w

erepeat

thes

AUCcalculationfor2

0tim

esandcompu

tethes

tand

arddeviationof

each

averages

AUC

which

ranges

from1119864minus4to5119864minus4(lowastvalues

takenfro

m[48])

The Scientific World Journal 11

Table4Com

paris

onof

AUCandPo

DSC

scoreo

fthe

prop

osed

mod

elwith

otherstate-of-the-artmod

els

Algorith

mC1

C2C3

C4C5

C6Av

gAU

CPo

DSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AvgAU

CAIM

0921

0653

0918

0543

0957

0461

0904

0435

0924

0508

0941

0671

0927

MES

R0798

0494

0854

0436

0927

0377

0718

0238

0837

0398

0918

0617

0842

ICL

0928

0681

0907

0501

0909

0348

0926

0537

0902

0498

0914

0569

0914

Itti

0939

0687

0924

0542

0952

0457

0882

0423

0924

0499

0941

0631

0927

MPQ

FT0805

0497

0865

0452

0927

0386

0735

0248

0852

0436

0921

0630

0851

SDSR

0879

0618

0908

0511

0948

0435

0801

0298

0904

0489

0933

064

50896

SUN

0766

0459

0787

0349

0880

0362

0708

0263

0719

0283

0817

0451

0780

HFT

0937

070

00923

0552

0937

044

8096

4064

80933

0554

096

0070

60942

LG0814

0515

0859

046

40904

0379

0635

0145

0862

0499

0884

0572

0826

ERDEM

0882

0617

0912

0555

0962

0526

0840

0349

0923

0575

0916

0635

0906

MQDCT

0840

0564

0894

0517

0947

0489

0806

0311

0888

0524

0923

0670

0883

AWS

0894

0625

0922

0569

0957

0480

0891

0402

0920

0534

0925

0655

0918

Prop

osed

0920

0653

094

3059

60976

0555

0934

0516

094

20532

0956

0688

094

5Th

etop

rank

edmod

elisin

bold

font

andthe2

ndrank

edmod

elisin

italic

fontTh

eoverallbestaverageA

UCscoreisg

iven

byou

rpropo

sedmod

el

12 The Scientific World Journal

Input Proposed AIM GBVS MESR ICL Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 7 Response of our algorithm and other 13 state-of-the-art algorithms on various psychological patterns

extent of the object and thus solve this task like segmentation-type binary labeling formulation [50 51] In contrast ourmodel is designed for location based (eye fixation) saliencymodeling and it is not designed to capture exact objectboundaries however by thresholding a saliency map we canget a binary map that can be used for testing performance ofa model for salient object detection Since our saliencymodelis location based we only use other location based modelsin comparison for a fair evaluation similar convention isfollowed in [2 17]

Dataset and Metric for Evaluation For salient object detec-tion the metric used by Li et al [2] is area under thecurve (AUC) and dice similarity coefficient (DSC) on IMSAL[2] dataset We will use the same metric and dataset forsalient object detectionThe DSC gives the overlap between athreshold saliency map and the ground truth Moreover thepeak value of the DSC [52] is considered an important way toestablish the best performance of an algorithm at an optimalthreshold thus we also give results with peak value of theDSC curve (PoDSC) Since AUC can be influenced by centerbias for fair comparison we turn off the center bias in allthe algorithmsThe GBVS [37] has built-in center bias and inorder to cater for that the author in [2] incorporates explicitcenter bias and shows that HFT [2] performs better thanGBVS on the same dataset which is also used in our paperWe do not employ center bias explicitly or implicitly in thepresented results and addedHFT instead for our comparisonTherefore GVBS [37] is skipped from Table 4 Furthermorewe perform Gaussian smoothing in all the algorithms tofind the optimal smoothing parameters for each class in thedataset and the optimal performance is quoted in the resultsgiven in Table 4

Performance We present results in comparison with other 12state-of-the-art algorithms The complete results are given in

Table 4 Our proposed scheme gives the best performanceon three categories C2 C3 and C5 and ranked secondon C4 and C6 Our model gives the highest average AUCscore on this dataset On different categories our results arecomparative to the HFT which is state of the art on thisdataset Apart fromHFT the performance is also significantlybetter in comparison to other algorithmsThedataset used forcomparison is quite complex but our algorithm performedwell for intermediate small objects with distracters althoughperformance on other cases is little less than other state-of-the-art algorithms

44 Psychological Patterns Wealso tested our saliencymodelon psychological patterns which are commonly used to givea qualitative performance on artificial scenarios which simu-late a pop-up phenomenonThese patterns simulate the pop-out phenomenon based on color intersection symmetryorientation curvature and candles image In order to checkthe general performance on various psychological tasks wetested the proposed model on eight psychological patternsFigure 7 gives the results of the proposed saliency modelalong with other popular models The proposed algorithmworks well on color symmetry and orientation as well as oncandle image but the performance is not good for curvatureand intersection patterns Figure 7 also shows that any singlemodel does not give good performance on all the patterns andthe best performance is more or less the same as given by theproposed scheme

5 Discussion

The image features representation drastically affects theinformation content and thus saliency estimation We useda bioinspired center surround saliency computation on twoparallel feature representations that give good performance

The Scientific World Journal 13

(a)

(b) (c)

(d) (e)

Figure 8 The fixed learned dictionary and the information lost by using such dictionary (a) Fixed learned dictionary ((b)-(c)) The inputimages ((d)-(e)) The information lost by using dictionary given in (a)

on both eye fixation prediction and salient object detectionSince a CSD operation depends on the difference between acenter portion and its surroundings a better image contentsrepresentation makes a more accurate and precise centersurround operation

We used an adaptive sparse representation to boost theperformance of the CSD operation In order to show theeffectiveness of the proposed approach we present bothquantitative and qualitative results For qualitative compar-ison we use a fix dictionary [29] (Figure 8(a)) learnt from

an ensemble of natural images We show that some of theinformation is lost if we use a fixed dictionary to represent animage and usually the lost information belongs to the salientregion in an image The difference between an input imageand the image reconstructed by a fixed dictionary is givenin Figure 8 The red cylinder Figure 8(b) and red box withtext Figure 8(c) is visible in the image plots Figures 8(d)and 8(e) These two objects are the salient features in bothimages and their appearance in the residual image shows thatcurrent representation which uses a fixed dictionary loses

14 The Scientific World Journal

Input Gnd truth Proposed AIM GBVS Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 9 Comparison of our method with other state-of-the-art saliency models The first column is the input image with second columnbeing ground truth followed by our proposed saliency model Results from all the other models are plotted in the same row

Table 5 sAUC score comparison using fixed [29] and adaptivesparse representation (proposed) for CSD saliency

DatasetBorji local [29] Fixeddictionary sparserepresentation

Proposed Adaptivedictionary sparserepresentation

Toronto 0670 07110Kootstra 0572 06061

Table 6 sAUC comparison of E Erdem and A Erdem [17] and theproposed color based nonlinear integration

Dataset E Erdem andA Erdem [17]

Proposed Nonlinearrepresentation

Bruce and Tsotsos [16] 07023 07084Kootstra et al [34] 0603 06152

some information that belong to salient portion of the inputimage

For quantitative comparison we employ sAUC to dothe comparison between saliency maps based on a fixeddictionary and the adaptive basis dictionary Table 5 gives acomparison of the both approaches on two datasets Torontoand Kootstra The performance difference is quite significantand it shows that an adaptive representation is much betterBased on these qualitative and quantitative results we canconclude that an adaptive image presentation is more viableand accurate for CSD based saliency computation

A model proposed in [17] gives an idea of nonlinearintegration of all features by covariance matrices and thesupported implementation uses color gradient and spatialinformation Our second contribution is the modification ofthat model and a proposal of using only color with spatial

information in nonlinearly integrated manner using samecovariance matrices For our proposed implementation(seeFigure 4) we modified the features used in [17] and comparethe saliencymap with Erdemrsquos [17] model in Table 6 using thesAUC score Two databases Toronto [16] and Kootstra [34]are used for simulations and the results indicate that by usingonly color with spatial information we can get better sAUCscore than integrating all features using covariance matricesIn Table 6 the difference in sAUC score is quite visible onboth datasets One possible reason of this improvement maybe that the correlation among different features like color andorientation is different and thus using a covariance basedrepresentation does not capture the underlying informationstructure in an efficient way as compared to when only colorinformation is used

One possible argument against our usage of only colorinformation however can be that without any gradient ororientation information a saliency model will fail to detectmany salient regions This argument can also supported bynature since neurons tuned to orientation in an image areknown to contribute to saliency computation [53] In ourmodel this issue is addressed by the sparse representationwhere the adaptive basis same as basis shown in Figure 8(a)is Gabor like filters with edge like structure and thus thesebases efficiently capture orientation information from theimage which complements our color information in nonlin-ear representation

Finally the sAUC score of dual representation Table 3shows that we achieve better eye fixation prediction thantreating both representations separately as shown in Tables5 and 6We believe that such improvement is due to the com-plementary behavior of both techniques since a combinedapproach better represents image contents with high fidelityand thus that in turn improves saliency detection Lastly for

The Scientific World Journal 15

illustration and visual comparison we present some saliencymaps produced by our algorithm along with other models inFigure 9

6 Conclusion

This paper shows that dual feature representationmanages torobustly capture image information which can be used in acenter surround operation to compute saliencyWe show thata CSD on adaptive sparse basis gives better results than a fixsparse basis representation In nonlinear representation weshow that nonlinearly integrated color channels with spatialinformation better capture underlying data structure andthus a CSD on such representation gives good results Finallywe consider both representations as complementary and thusa fused saliencymap not only give good results on human eyefixations but also detect salient objects with high accuracy

In future wewill incorporate some top-downmechanismto better imitative human saliency computation capabilitiesbased on learning and experience Another possible exten-sion of the existing work is to test dynamic scenes video byincorporating additional motion information in the currentscheme

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the national Natural Sci-ence Foundation of China (61175096) and Specialized Fundfor Joint Building Program of Beijing Municipal EducationCommission

References

[1] C Siagian and L Itti ldquoBiologically inspired mobile robot visionlocalizationrdquo IEEE Transactions on Robotics vol 25 no 4 pp861ndash873 2009

[2] J Li M D Levine X An X Xu and H He ldquoVisual saliencybased on scale-space analysis in the frequency domainrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 4 pp 996ndash1010 2012

[3] Y Su Q Zhao L Zhao and D Gu ldquoAbrupt motion trackingusing a visual saliency embedded particle filterrdquo Pattern Recog-nition vol 47 no 5 pp 1826ndash1834 2014

[4] C Guo Q Ma and L Zhang ldquoSpatio-temporal saliency detec-tion using phase spectrum of quaternion fourier transformrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo08) pp 1ndash8 June 2008

[5] X Hou and L Zhang ldquoThumbnail generation based on globalsaliencyrdquo in Advances in Cognitive NeurodynamicsmdashICCN2007 pp 999ndash1003 Springer Amsterdam The Netherlands2007

[6] A Borji and L Itti ldquoState-of-the-art in visual attention mod-elingrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 35 no 1 pp 185ndash207 2013

[7] L Itti C Koch and E Niebur ldquoAmodel of saliency-based visualattention for rapid scene analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 20 no 11 pp 1254ndash12591998

[8] AM Treisman ldquoFeature integration theoryrdquoCognitive Psychol-ogy vol 12 no 1 pp 97ndash136 1980

[9] J K Tsotsos S M Culhane W Y Kei Wai Y Lai N Davisand F Nuflo ldquoModeling visual attention via selective tuningrdquoArtificial Intelligence vol 78 no 1-2 pp 507ndash545 1995

[10] R Rae Gestikbasierte Mensch-Maschine-Kommunikation aufder grundlage visueller aufmerksamkeit und adaptivitat [PhDthesis] Universitat Bielefeld 2000

[11] D Parkhurst K Law and E Niebur ldquoModeling the role ofsalience in the allocation of overt visual attentionrdquo VisionResearch vol 42 no 1 pp 107ndash123 2002

[12] J Li Y Tian T Huang and W Gao ldquoProbabilistic multi-tasklearning for visual saliency estimation in videordquo InternationalJournal of Computer Vision vol 90 no 2 pp 150ndash165 2010

[13] MCerf J HarelW Einhauser andCKoch ldquoPredicting humangaze using low-level saliency combined with face detectionrdquo inAdvances in Neural Information Processing Systems vol 20 pp241ndash248 2007

[14] A Torralba ldquoModeling global scene factors in attentionrdquoJournal of the Optical Society of America A Optics and ImageScience and Vision vol 20 no 7 pp 1407ndash1418 2003

[15] A Oliva and A Torralba ldquoModeling the shape of the scenea holistic representation of the spatial enveloperdquo InternationalJournal of Computer Vision vol 42 no 3 pp 145ndash175 2001

[16] N D B Bruce and J K Tsotsos ldquoSaliency attention and visualsearch an information theoretic approachrdquo Journal of Visionvol 9 no 3 article 5 2009

[17] E Erdem and A Erdem ldquoVisual saliency estimation by nonlin-early integrating features using region covariancesrdquo Journal ofVision vol 13 no 4 article 11 2013

[18] J H van Hateren ldquoReal and optimal neural images in earlyvisionrdquo Nature vol 360 no 6399 pp 68ndash70 1992

[19] D J Field ldquoRelations between the statistics of natural imagesand the response properties of cortical cellsrdquo Journal of theOptical Society of America A Optics and Image Science vol 4no 12 pp 2379ndash2394 1987

[20] B A Olshausen and D J Field ldquoNatural image statistics andefficient codingrdquo Network Computation in Neural Systems vol7 no 2 pp 333ndash339 1996

[21] M Kwon G Legge F Fang A Cheong and S He ldquoIdentifyingthe mechanism of adaptation to prolonged contrast reductionrdquoJournal of Vision vol 9 no 8 p 976 2009

[22] X Sun H Yao and R Ji ldquoVisual attention modeling based onshort-term environmental adaptionrdquo Journal of Visual Commu-nication and Image Representation vol 24 no 2 pp 171ndash1802013

[23] A Garcia-Diaz X R Fdez-Vidal X M Pardo and R DosilldquoSaliency from hierarchical adaptation through decorrelationand variance normalizationrdquo Image and Vision Computing vol30 no 1 pp 51ndash64 2012

[24] C E Shannon ldquoAmathematical theory of communicationrdquo BellSystem Technical Journal vol 27 no 3 pp 379ndash423 1948

[25] Y Karklin and M S Lewicki ldquoEmergence of complex cellproperties by learning to generalize in natural scenesrdquo Naturevol 457 no 7225 pp 83ndash86 2009

[26] H J Seo and P Milanfar ldquoStatic and space-time visual saliencydetection by self-resemblancerdquo Journal of Vision vol 9 no 12pp 1ndash27 2009

16 The Scientific World Journal

[27] D J Jobson Z-U Rahman and G AWoodell ldquoProperties andperformance of a centersurround retinexrdquo IEEE Transactionson Image Processing vol 6 no 3 pp 451ndash462 1997

[28] A Guo D Zhao S Liu X Fan and W Gao ldquoVisual attentionbased image quality assessmentrdquo in Proceedings of the 18thIEEE International Conference on Image Processing (ICIP rsquo11) pp3297ndash3300 September 2011

[29] A Borji and L Itti ldquoExploiting local and global patch raritiesfor saliency detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo12) pp 478ndash485 2012

[30] L Duan C Wu J Miao L Qing and Y Fu ldquoVisual saliencydetection by spatially weighted dissimilarityrdquo in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR rsquo11) pp 473ndash480 June 2011

[31] JMWolfe ldquoGuided search 20 a revisedmodel of visual searchrdquoPsychonomic Bulletin amp Review vol 1 no 2 pp 202ndash238 1994

[32] O Le Meur P Le Callet D Barba and DThoreau ldquoA coherentcomputational approach to model bottom-up visual attentionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 28 no 5 pp 802ndash817 2006

[33] A Torralba A Oliva M S Castelhano and J M HendersonldquoContextual guidance of eye movements and attention in real-world scenes the role of global features in object searchrdquoPsychological Review vol 113 no 4 pp 766ndash786 2006

[34] G Kootstra N Bergstrom and D Kragic ldquoUsing symmetry toselect fixation points for segmentationrdquo in Proceedings of the20th International Conference on Pattern Recognition (ICPR rsquo10)pp 3894ndash3897 August 2010

[35] C Li J Xue N Zheng X Lan and Z Tian ldquoSpatio-temporalsaliency perception via hypercomplex frequency spectral con-trastrdquo Sensors vol 13 no 3 pp 3409ndash3431 2013

[36] L Itti and P Baldi ldquoBayesian surprise attracts human attentionrdquoVision Research vol 49 no 10 pp 1295ndash1306 2009

[37] J Harel C Koch and P Perona ldquoGraph-based visual saliencyrdquoin Advances in Neural Information Processing Systems vol 19pp 545ndash552 2006

[38] X Hou and L Zhang ldquoSaliency detection a spectral residualapproachrdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo07)pp 1ndash8 June 2007

[39] B Schauerte and R Stiefelhagen ldquoQuaternion-based spectralsaliency detection for eye fixation predictionrdquo in ComputerVisionmdashECCV 2012 A Fitzgibbon S Lazebnik P Perona YSato and C Schmid Eds Lecture Notes in Computer Sciencepp 116ndash129 2012

[40] W Kienzle F Wichmann B Scholkopf and M Franz ANonparametric Approach to Bottom-Up Visual Saliency 2007

[41] J Tilke K Ehinger F Durand and A Torralba ldquoLearningto predict where humans lookrdquo in Proceedings of the 12thInternational Conference on Computer Vision (ICCV rsquo09) pp2106ndash2113 October 2009

[42] W Wang Y Wang Q Huang and W Gao ldquoMeasuring visualsaliency by site entropy raterdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 2368ndash2375 June 2010

[43] O Tuzel F Porikli and P Meer ldquoRegion covariance afast descriptor for detection and classificationrdquo in ComputerVisionmdashECCV 2006 vol 3952 of Lecture Notes in ComputerScience pp 589ndash600 2006

[44] httpssitesgooglecomsitesparsenonlinearsaliencymodelhomedownloads

[45] A Hyvarinen and E Oja ldquoA fast fixed-point algorithm forindependent component analysisrdquo Neural Computation vol 9no 7 pp 1483ndash1492 1997

[46] X Hou and L Zhang ldquoDynamic visual attention searching forcoding length incrementsrdquo in Advances in Neural InformationProcessing Systems vol 21 pp 681ndash688 2008

[47] L Zhang M H Tong T K Marks H Shan and G WCottrell ldquoSUN a Bayesian framework for saliency using naturalstatisticsrdquo Journal of Vision vol 8 no 7 article 32 pp 1ndash202008

[48] J Zhang and S Sclaroff ldquoSaliency detection a boolean mapapproachrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV rsquo13) 2013

[49] X Hou J Harel and C Koch ldquoImage signature highlightingsparse salient regionsrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 34 no 1 pp 194ndash201 2012

[50] E Rahtu J Kannala M Salo and J Heikkila ldquoSegmentingsalient objects from images and videosrdquo in Computer VisionmdashECCV 2010 vol 6315 of Lecture Notes in Computer Science pp366ndash379 2010

[51] M-M Cheng G-X Zhang N J Mitra X Huang and S-M Hu ldquoGlobal contrast based salient region detectionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo11) pp 409ndash416 June 2011

[52] T Veit J-P Tarel P Nicolle and P Charbonnier ldquoEvaluationof road marking feature extractionrdquo in Proceedings of the11th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo08) pp 174ndash181 December 2008

[53] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model of simple receptive fields in catstriate cortexrdquo Journal of Neurophysiology vol 58 no 6 pp1233ndash1258 1987

Page 4: Saliency Detection Using Sparse and Nonlinear …...Saliency Detection Using Sparse and Nonlinear Feature Representation ShahzadAnwar,1 QingjieZhao,1 MuhammadFarhanManzoor,2 andSaqibIshaqKhan3

4 The Scientific World Journal

Our saliency model is inspired from mainly two typesof models sparse [16 20] and nonlinear representation [27]We propose a novel dual approach based on both sparseand nonlinear feature representation Inspired by biologicalevidence of neural receptive field properties [20] that effi-ciently process natural images in a sparse manner we use asparse image representation Moreover in order to representadaptivity of neurons to better tackle a new environment weuse an adaptive basis dictionary in an ICA approach [22]Thus our proposed method simultaneously uses sparsity andadaptivity In literature the model similar to our adaptivesparse representation is [22] In [22] Sun et al used aninformation theoretic global approach for saliency computa-tion but we use a more bioplausible local CSD for saliencycomputation Secondly we propose nonlinearly integratedrepresentation of single feature channels color along withspatial information for saliency computation Our approachis a modification of the model proposed in [17] whereall features and channels are nonlinearly integrated usingcovariance matrices although we propose that only colorinformation is enough and it can better estimate saliency inour framework Here also a CSD approach is used for saliencycomputation Finally a combined saliency map is formed byfusing the outputs given by the two representations

Contributions Major contributions of this work can besummarized as follows

(1) A novel dual image feature representation simultane-ous sparse and nonlinear feature representation

(2) CSD based saliency computation in adaptive sparserepresentation

(3) Only color based nonlinear integrated covariancerepresentation followed by CSD computation

(4) Improved results with comparison to other states ofthe models that are established by extensive testingon popular eye fixation prediction datasets and on asalient object detection dataset

3 Proposed Model

Our proposed scheme is given in Figure 2 An input imageis being simultaneously represented in sparse and nonlinearform Then saliency is computed by local center surroundoperation and finally both maps are combined to form asingle saliency map

For sparse representation we break an image into patchesand perform independent component analysis to derive basisLater these basis with sparse coefficient are used to representthe image Furthermore again after converting image intopatches we only take color information and integrate allchannels in a nonlinear fashion using covariance matricesalong with spatial information to represent an image patch

31 Mathematical Modeling In this section we cover math-ematical formulation of sparse representation nonlinearrepresentation and saliency computation Some necessarydiscussion is included to elaborate few concepts and also

some references will be given to avoid unnecessary formu-lation of the well-known concepts

Sparse Representation with ICAWe use an ICA based imagerepresentation thus an input image 119868 can be sparselyrepresented as

1198681015840

= 119860s (1)

where 119860 is a dictionary consisting of a set of basis and s con-sists of respective coefficients In our case we learn 119860 fromevery input image thus we adapt dictionary for every inputstimulusThis approach results inminimum information losswhich is a basic drawback of a fixed dictionary learned froman ensemble of images The sparse coefficients are learned byprojecting an input image to the basis such as

s = 119882119868 (2)

where

119882 = 119860minus1

(3)

The basis have dimensions the same as the patches formedfrom the input image Finally 119868

1015840 in patches form can berepresented as

1198681015840

=

1198991015840

K119896=1

1198601199011015840

119896 (4)

where 1199011015840

119896is the 119896th patchrsquos sparse coefficient vector consisting

of 1198981015840 co-efficients and there are total 119899

1015840 number of patchesin 1198681015840 Moreover K represents a function that reshape and

arranges patches at respective position to form an imageFigure 3 gives the depiction of the whole process

Nonlinear Representation with Covariance Matrices Our fea-ture matrix 119865 is based on raw RGB color space values of 119868

along with pixel position information

119865 = [119868119877 119868119866 119868119861 119909 119910] (5)

where every pixel in 119865 is a 5-dimensional vector 119891119894119894=12119901

where 119901 is the total number of pixels in the image Ourfeatures are different than those used in E Erdem andA Erdem [17] since we do not incorporate any gradientinformation in our feature matrix 119865 In (5) color along withspatial information is used rather than E Erdem and AErdem [17] approach of making a features matrix consistingof all features

The next step is nonlinear representation of 119865 usingcovariancematrices along with first order statistics [17] Tuzelet al [43] gave the concept of encoding a patch by a covariancematrix Later it was used in many applications In saliencydomain E Erdem and A Erdem [17] used patch covariancewith first order statistics for image feature representation andwe will dwell on his approach for our case Thus calculatinglocal covariance matrices for an image patch 119901

119896 we get

119862 (119901119896) =

1

119898 minus 1

119898

sum

119894=1

(119891119894minus 120583119896) (119891119894minus 120583119896)119879 (6)

The Scientific World Journal 5

Adaptive sparserepresentation

Center surroundsaliency

computation

Center surroundsaliency

computation

Image to patchesconversion

Saliency map

Color covariancenon-linear

representation

Image to patchesconversion

Π

Figure 2 Proposed model for saliency computation

Center surroundsaliency

computation

Image patches Sparse co-efficientvectors

Adaptive basis

p998400i p

998400i+1 p998400

i+2

Figure 3 Saliency computation in adaptive sparse representation

where a patch 119901119896consists of 119898 pixels with mean 120583

119896 Now

first order statistics is incorporated in the covariancematricesusing the method mentioned in [17] Then the new repre-sentation of covariance matrices with first order statisticsembedded for a patch is given by

120595119896

= 120595 (119862 (119901119896)) (7)

where function 120595 embeds first order statistics in an inputmatrix The final nonlinear feature representation of image

with 120595119896represents a 119896th patch and 119899 being total number of

patches is given by

11986810158401015840

=

119899

K119896=1

120595119896 (8)

where also K function arranges patches at respective posi-tions to form an image The whole representation is given inFigure 4Saliency Computation The saliency is computed by CSDoperation and then extended to multiple scales The CSDoperation is shown in Figure 1 where a patch under consider-ation is in red rectangle and surrounding area is highlighted

6 The Scientific World Journal

Input

RG

B

Output

Image to patchesconversion and

covariancerepresentation

First orderstatistics

embeddedPosition

information

I(x y)

Feature matrixF = [R G B x y]

Figure 4 Nonlinear representation of an input image

in yellow rectangle The saliency of red patch 119901119894 is given

by its dissimilarity between 119908 surrounding patches (yellowrectangle) as

119878 (119901119894) =

1

119908

119908

sum

119895=1

119889 (119901119894 119901119895) (9)

where dissimilarity 119889(119901119894 119901119895) between two patches is given by

119889 (119901119894 119901119895) =

10038171003817100381710038171003817120572 (119901119894) minus 120572 (119901

119895)10038171003817100381710038171003817

1 +10038171003817100381710038171003817119883119894minus 119883119895

10038171003817100381710038171003817

(10)

where119883119894and119883

119895are the central position of the patches 119901

119894and

119901119895 For the case of sparse representation we have

120572 (119901119894) = 119901

1015840

119894(11)

and for nonlinear representation we have

120572 (119901119894) = 120595119894= 120595 (119862 (119901

119894)) (12)

Thus the saliencymap for patch 119901119894derived from 119868

1015840 and 11986810158401015840 can

be given as

1198781198681015840 (119901119894) =

1

119908

119908

sum

119895=1

100381710038171003817100381710038171199011015840

119894minus 1199011015840

119895

10038171003817100381710038171003817

1 +10038171003817100381710038171003817119883119894minus 119883119895

10038171003817100381710038171003817

(13)

and with 120595(119862(119901119894)) being in vector form

11987811986810158401015840 (119901119894) =

1

119908

119908

sum

119895=1

10038171003817100381710038171003817120595 (119862 (119901

119894)) minus 120595 (119862 (119901

119895))

10038171003817100381710038171003817

1 +10038171003817100381710038171003817119883119894minus 119883119895

10038171003817100381710038171003817

(14)

The multiscale saliency by sparse approach is given by

1198781198681015840 (119909) = N(prod

1198971015840120598119871

1198781198971015840

1198681015840 (119909)) (15)

and for nonlinear integrated approach is given by

11987811986810158401015840 (119909) = N(prod

119897120598119871

119878119897

11986810158401015840 (119909)) (16)

where 1198971015840 and 119897 represent the number of scales and N shows

normalization Finally saliency map becomes

119878119868

= 119866120590

⊛ (N (1198781198681015840 lowast 11987811986810158401015840)) (17)

where119866120590represents the Gaussian smoothing by convolution

⊛ operation and lowast stands for multiplication operation

4 Experimentation

In this section we thoroughly evaluate the proposed modelwith three different experiments human eye fixation pre-diction salient-object detection and response to variouspsychological patterns The human eye fixation prediction isthe basic and necessary test to check the performance of asaliency map against the collected eye fixation from severalhuman subjects

How well a saliency map distinguishes and highlights anobject in an image shows its ability of salient object detectionThe salient object detection capability of a model is evaluatedby employing some metrics that compare the generatedsaliency map against the ground truth made by manuallabeling of the salient region in an image by human subjectsThe psychological patterns give a qualitative analysis of thesaliency model These patterns are designed to check pop-upresponses in different scenarios like orientation conjunctionand color and so forth Code (Matlab P-code) of the proposedmodel used for experimentation is available online [44]

41 Parameter Setting Before pursuing the evaluation of theproposed model we fix the parameters used to generatethe saliency maps by our model These parameters willremain the same for all the experiments Derivation ofthese parameters will be discussed in the next section afterthe introduction of the datasets and the metric used forevaluation

Sparse Representation We resize all input images to 80 times

60 pixels and use only single scale 1198971015840

= 1 for saliencycomputation Patches of 5 times 5 pixels [22] are generated withsliding overlapping window from every input image to learnthe basis for the dictionary and for the saliency computation

The Scientific World Journal 7

Table 1 Following are the input image resolution used in eachalgorithm with default parameter settings Some algorithms (atleastlowast) do internally reduce dimensions for fast computation andoptimal performance

Model Acronym ResolutionAIM [16] 12(W times H)lowastGBVS [37] W times HMESR [39] 64 times 48lowastICL [46] W times HlowastItti [37] W times HMPQFT [39] 64 times 48AWS [23] W times HlowastSDSR [26] W times HSUN [47] 12(W times H)HFT [2] 128 times 128LG [29] 256 times 256ERDM [17] 512 times 512ΔQDCT [39] 64 times48lowastProposed W times H

An ICA package available online FAST ICA [45] is used forthis experimentation

Nonlinear Representation For nonlinear image representa-tion RGB color and position information is used in onlineavailable implementation of E Erdem andA Erdem [17]Thesaliency is computed with the default parameters used in [17]that have every input image being resized to 512 times 512 pixelsand five different patch sizes 119901

119894= 8 16 32 64 128 and thus

119897 = 5 are used for saliency computationFinally normalized sparse representationrsquos saliency map

is rescaled to the size of nonlinear representationrsquos saliencymap and both maps are multiplied and normalizedThen thefinal saliencymap is resized to the actual input image size andused for experimentation The input image resolutions usedin all the saliency algorithms for experimentation are givenin Table 1

42 Human Eye Fixation Prediction In order to validatethe proposed model with human eye fixation predictionssaliency maps are generated on three datasets and for a faircomparison shuffle area under the curve score(sAUC) is usedto quantify the results

Dataset A reasonable dataset that can be used for evaluationof human eye fixation prediction must be complex anddiverse enough so that performance can be thoroughly eval-uated In literature Toronto [16] and Kootstra [34] datasetsare the most popular and widely used datasets IMSAL [2] isa relatively new dataset which we also used in our evaluation

Toronto dataset was prepared by Bruce and Tsotsos [16]and it consists of 120 images each with 681 times 511 pixels Thisdataset has both indoor and outdoor imagesThe eye fixationground truth is based on 20 subjects who free viewed theimages for few seconds

Kootstra dataset was used in [34] It consists of 101 imageseach with a resolution of 1024 times 768 pixels These imagesconsist of flowers natural scenes automans and buildingsThis dataset is significantly complex because it has manyimages with no explicit salient regions The eye fixationground truth available with this dataset is based on freeviewing of 31 subjects for a few seconds

IMSAL dataset is given by Li et al [2] it consists of 235images which are collected online through an internet searchengine and some images were taken from literature Theseimages are divided into six categories with 50 images havinglarge salient regions 80 images with intermediate salientregions 60 images with small salient regions 15 imageswith cluttered backgrounds and 15 images with repeatingdistracters These images give a good benchmark for per-formance evaluation because of the significant complexitygiven by variable size of salient objects objects with clutterand objects with distracters The accompanied ground truthconsists of both eye fixation information and binary maskscreated by human subjects who manually marked the salientobject in an image

Metric for Evaluation The most popular method to evaluatethe performance of a saliency map is to calculate area underthe curve (AUC) score of receiver operating characteristics(ROC) curve At first a saliency map is thresholded andused as a binary classifier with human eye fixations actingas positive set and some other points uniformly random asnegative set to plot an ROC curve The AUC of that ROC iscalculated andused as ameasure of performance comparison

There are various variants of AUC available in literatureand the basic difference between them is the choice ofnegative set points We will use the shuffled area under thecurve sAUC score because of its ability to cater for centerbias [41] since some models implicitly incorporate centerbias which makes a fair comparison difficult to performit is becoming standard to present results with sAUC InsAUC score the positive points consists of human subjectseye fixation on that image and the negative set consists ofall the fixation of subjects on the rest of the dataset imagesThe sAUC gives a 05 score on a center Gaussian blob whichis about the same as a random or chance score whereas allthe other versions of AUC [6] give very high score becausethey are affected by the center bias For our experimentationwe used sAUC available online by Schauerte and Stiefelhagen[39] We calculate every sAUC score for 20 times [47]and then use the mean value We found that the standarddeviation of the sAUC approximately ranges from 1119864 minus 4 to5119864 minus 4 in our experiments

Performance Analysis with Resolution In order to find theoptimal parameters for the proposed model we treat bothrepresentations separately and find the best parameters foreach representation that can be incorporated in the proposedmodel We plotted both sparse and nonlinear representationwith variable resolution on all three datasets and measurethe sAUC score Using various parameters given in Table 2Figure 5 is plotted which gives the performance of bothrepresentations on the three datasets Figure 5 shows that for

8 The Scientific World Journal

Table 2 Various scales parameters which are used for evaluation of sparse and nonlinear representation

Scale Adaptive sparse representation Nonlinear representationResolution Patch size Resolution Patch size

1 40 times 30 5 128 times 128 2 4 8 16 32

2 80 times 60 5 256 times 256 4 8 16 32 64

3 160 times 120 5 512 times 512 8 16 32 64 128

08

07

06

1 2 3

Scale

sAU

C

Sparse representation

BruceKootstra

IMSAL

(a)

Non-linear representation

075

070

065

0601 2 3

Scale

sAU

C

BruceKootstra

IMSAL

(b)

Figure 5 sAUC score for sparse and nonlinear representation The values in (a) and (b) are maximum at scale 2 and scale 3 for all threedatasets

sparse representation the performance is maximum with 80times 60 pixels (scale 2) and for nonlinear representation we getgood performance at 512 times 512 pixels (scale 3) Usually theresolution of the input images is not very high so we restrictto 512 times 512 as upper bound for nonlinear representation andsame resolution with respective patch sizes is used in [17]The image resolution and patch sizes of different scales forthe both models used for evaluation are given in Table 2Based on this analysis we incorporate the parameters of scale2 and scale 3 for sparse and nonlinear representation in theproposed saliency model

Performance Comparison with Other Models The results ofour model along with comparison with 13 state-of-the-artmethods are given in Table 3 The detailed performancewith variable Gaussian smoothing is given in Figure 6 Thesimulation codes for all these methods are taken from theauthors websites We used multiscale and quaternion basedimplementation for spectral residue (MESR) PQFT [4] andDCT [49] as proposed by Schauerte and Stiefelhagen [39]which gives higher score than original methods Erdemrsquos [17]implementation with first order statistics embedded is usedfor simulation since it gives higher sAUC score Results withthe proposed technique are quite consistent and the proposedmethod outperforms the state-of-the-artΔQDCT [39]model

on the Toronto dataset and performs comparatively well onthe Kootstra dataset and IMSAL dataset No single modelperforms well on all these datasets and the performanceof other models changes with the dataset but our modelshows consistency and remained either ranked 1 or ranked2 on these datasets We believe that the high eye fixationprediction is due to the adaptive nature of the model anddue to dual representation of features The adaptivity makesa feature space optimal for the current image thus a moreaccurate representation of the features is possible which inturn accounts for better saliency map estimation Moreovera single representation may not be enough for every caseFinally these results can improve further if we use a multi-scale representation for ICA based representation which weskipped due to computational time constraints

43 Salient Object Detection A saliency map can be usedto detect a salient object in an image The basic premiseis that if an image consists of an object which stands outfrom the rest of the image then it should be identifiedby a saliency algorithm There is a different branch invisual saliency modeling which consists of models that arespecifically designed to detect salient objects These modelsfind the salient object in an image and then segment thewhole

The Scientific World Journal 9

000 002 004 006 008

120590

072

070

068

066

064

062

060

Toronto dataset

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFTLGERDEMQDCTAWSProposed

sAU

C

010 012 014

(a)

062

060

058

056

054

052

Kootstra dataset

000 002 004 006 008

120590

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFTLGERDEMQDCTAWSProposed

sAU

C

010 012 014

(b)

078

076

074

072

070

068

066

IMSAL dataset

000 002 004 006 008

120590

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFT

ERDEMQDCTAWSProposed

sAU

C

LG

010 012 014 016

(c)

Figure 6 Toronto Kootstra and IMSAL datasets with variable Gaussian smoothing for all algorithms in comparison The 119909-axis representsthe 120590 of the smoothing Gaussian (in image width) (In (c) only GBVS LG and ΔQDCT are taken from [48])

10 The Scientific World Journal

Table3Com

paris

onof

sAUCscoreo

fthe

prop

osed

mod

elwith

13otherstate-of-the-artmod

els

Dataset

AIM

[16]

GBV

S[37]

MES

R[39]

ICL[46]

Itti[37]

MPQ

FT[39]

SDSR

[26]

SUN[47]

HFT

[2]

LG[29]

ERDM

[17]

ΔQDCT

[39]

AWS[23]

Prop

osed

TORO

NTO

[16]

0699

064

0407141

064

76064

8107119

07062

066

6106915

06939

07023

07172

07130

07179

Optim

al120590

004

001

005

003

001

004

003

003

006

004

004

005

001

004

KOOTS

TRA[34]

05901

05572

05978

05769

05689

05985

05984

05613

05911

05938

0603

05998

06185

06157

Optim

al120590

003

001

004

001

004

005

004

002

005

001

003

003

001

003

IMSA

L[2]

07424

lowast076

6507153

07364

07512

071694

07403

06955

07419lowast07356

07455

lowast07434

07468

07560

Optim

al120590

008

007

015

013

009

014

013

016

009

012

012

010

005

009

Thetop

rank

edmod

elisin

boldfont

andsecond

rank

edmod

elisin

italic

fontH

erer

esultswith

optim

alaverages

AUCof

each

metho

dwith

thec

orrespon

ding

Gaussianblur(120590)are

repo

rtedA

sin[47]w

erepeat

thes

AUCcalculationfor2

0tim

esandcompu

tethes

tand

arddeviationof

each

averages

AUC

which

ranges

from1119864minus4to5119864minus4(lowastvalues

takenfro

m[48])

The Scientific World Journal 11

Table4Com

paris

onof

AUCandPo

DSC

scoreo

fthe

prop

osed

mod

elwith

otherstate-of-the-artmod

els

Algorith

mC1

C2C3

C4C5

C6Av

gAU

CPo

DSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AvgAU

CAIM

0921

0653

0918

0543

0957

0461

0904

0435

0924

0508

0941

0671

0927

MES

R0798

0494

0854

0436

0927

0377

0718

0238

0837

0398

0918

0617

0842

ICL

0928

0681

0907

0501

0909

0348

0926

0537

0902

0498

0914

0569

0914

Itti

0939

0687

0924

0542

0952

0457

0882

0423

0924

0499

0941

0631

0927

MPQ

FT0805

0497

0865

0452

0927

0386

0735

0248

0852

0436

0921

0630

0851

SDSR

0879

0618

0908

0511

0948

0435

0801

0298

0904

0489

0933

064

50896

SUN

0766

0459

0787

0349

0880

0362

0708

0263

0719

0283

0817

0451

0780

HFT

0937

070

00923

0552

0937

044

8096

4064

80933

0554

096

0070

60942

LG0814

0515

0859

046

40904

0379

0635

0145

0862

0499

0884

0572

0826

ERDEM

0882

0617

0912

0555

0962

0526

0840

0349

0923

0575

0916

0635

0906

MQDCT

0840

0564

0894

0517

0947

0489

0806

0311

0888

0524

0923

0670

0883

AWS

0894

0625

0922

0569

0957

0480

0891

0402

0920

0534

0925

0655

0918

Prop

osed

0920

0653

094

3059

60976

0555

0934

0516

094

20532

0956

0688

094

5Th

etop

rank

edmod

elisin

bold

font

andthe2

ndrank

edmod

elisin

italic

fontTh

eoverallbestaverageA

UCscoreisg

iven

byou

rpropo

sedmod

el

12 The Scientific World Journal

Input Proposed AIM GBVS MESR ICL Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 7 Response of our algorithm and other 13 state-of-the-art algorithms on various psychological patterns

extent of the object and thus solve this task like segmentation-type binary labeling formulation [50 51] In contrast ourmodel is designed for location based (eye fixation) saliencymodeling and it is not designed to capture exact objectboundaries however by thresholding a saliency map we canget a binary map that can be used for testing performance ofa model for salient object detection Since our saliencymodelis location based we only use other location based modelsin comparison for a fair evaluation similar convention isfollowed in [2 17]

Dataset and Metric for Evaluation For salient object detec-tion the metric used by Li et al [2] is area under thecurve (AUC) and dice similarity coefficient (DSC) on IMSAL[2] dataset We will use the same metric and dataset forsalient object detectionThe DSC gives the overlap between athreshold saliency map and the ground truth Moreover thepeak value of the DSC [52] is considered an important way toestablish the best performance of an algorithm at an optimalthreshold thus we also give results with peak value of theDSC curve (PoDSC) Since AUC can be influenced by centerbias for fair comparison we turn off the center bias in allthe algorithmsThe GBVS [37] has built-in center bias and inorder to cater for that the author in [2] incorporates explicitcenter bias and shows that HFT [2] performs better thanGBVS on the same dataset which is also used in our paperWe do not employ center bias explicitly or implicitly in thepresented results and addedHFT instead for our comparisonTherefore GVBS [37] is skipped from Table 4 Furthermorewe perform Gaussian smoothing in all the algorithms tofind the optimal smoothing parameters for each class in thedataset and the optimal performance is quoted in the resultsgiven in Table 4

Performance We present results in comparison with other 12state-of-the-art algorithms The complete results are given in

Table 4 Our proposed scheme gives the best performanceon three categories C2 C3 and C5 and ranked secondon C4 and C6 Our model gives the highest average AUCscore on this dataset On different categories our results arecomparative to the HFT which is state of the art on thisdataset Apart fromHFT the performance is also significantlybetter in comparison to other algorithmsThedataset used forcomparison is quite complex but our algorithm performedwell for intermediate small objects with distracters althoughperformance on other cases is little less than other state-of-the-art algorithms

44 Psychological Patterns Wealso tested our saliencymodelon psychological patterns which are commonly used to givea qualitative performance on artificial scenarios which simu-late a pop-up phenomenonThese patterns simulate the pop-out phenomenon based on color intersection symmetryorientation curvature and candles image In order to checkthe general performance on various psychological tasks wetested the proposed model on eight psychological patternsFigure 7 gives the results of the proposed saliency modelalong with other popular models The proposed algorithmworks well on color symmetry and orientation as well as oncandle image but the performance is not good for curvatureand intersection patterns Figure 7 also shows that any singlemodel does not give good performance on all the patterns andthe best performance is more or less the same as given by theproposed scheme

5 Discussion

The image features representation drastically affects theinformation content and thus saliency estimation We useda bioinspired center surround saliency computation on twoparallel feature representations that give good performance

The Scientific World Journal 13

(a)

(b) (c)

(d) (e)

Figure 8 The fixed learned dictionary and the information lost by using such dictionary (a) Fixed learned dictionary ((b)-(c)) The inputimages ((d)-(e)) The information lost by using dictionary given in (a)

on both eye fixation prediction and salient object detectionSince a CSD operation depends on the difference between acenter portion and its surroundings a better image contentsrepresentation makes a more accurate and precise centersurround operation

We used an adaptive sparse representation to boost theperformance of the CSD operation In order to show theeffectiveness of the proposed approach we present bothquantitative and qualitative results For qualitative compar-ison we use a fix dictionary [29] (Figure 8(a)) learnt from

an ensemble of natural images We show that some of theinformation is lost if we use a fixed dictionary to represent animage and usually the lost information belongs to the salientregion in an image The difference between an input imageand the image reconstructed by a fixed dictionary is givenin Figure 8 The red cylinder Figure 8(b) and red box withtext Figure 8(c) is visible in the image plots Figures 8(d)and 8(e) These two objects are the salient features in bothimages and their appearance in the residual image shows thatcurrent representation which uses a fixed dictionary loses

14 The Scientific World Journal

Input Gnd truth Proposed AIM GBVS Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 9 Comparison of our method with other state-of-the-art saliency models The first column is the input image with second columnbeing ground truth followed by our proposed saliency model Results from all the other models are plotted in the same row

Table 5 sAUC score comparison using fixed [29] and adaptivesparse representation (proposed) for CSD saliency

DatasetBorji local [29] Fixeddictionary sparserepresentation

Proposed Adaptivedictionary sparserepresentation

Toronto 0670 07110Kootstra 0572 06061

Table 6 sAUC comparison of E Erdem and A Erdem [17] and theproposed color based nonlinear integration

Dataset E Erdem andA Erdem [17]

Proposed Nonlinearrepresentation

Bruce and Tsotsos [16] 07023 07084Kootstra et al [34] 0603 06152

some information that belong to salient portion of the inputimage

For quantitative comparison we employ sAUC to dothe comparison between saliency maps based on a fixeddictionary and the adaptive basis dictionary Table 5 gives acomparison of the both approaches on two datasets Torontoand Kootstra The performance difference is quite significantand it shows that an adaptive representation is much betterBased on these qualitative and quantitative results we canconclude that an adaptive image presentation is more viableand accurate for CSD based saliency computation

A model proposed in [17] gives an idea of nonlinearintegration of all features by covariance matrices and thesupported implementation uses color gradient and spatialinformation Our second contribution is the modification ofthat model and a proposal of using only color with spatial

information in nonlinearly integrated manner using samecovariance matrices For our proposed implementation(seeFigure 4) we modified the features used in [17] and comparethe saliencymap with Erdemrsquos [17] model in Table 6 using thesAUC score Two databases Toronto [16] and Kootstra [34]are used for simulations and the results indicate that by usingonly color with spatial information we can get better sAUCscore than integrating all features using covariance matricesIn Table 6 the difference in sAUC score is quite visible onboth datasets One possible reason of this improvement maybe that the correlation among different features like color andorientation is different and thus using a covariance basedrepresentation does not capture the underlying informationstructure in an efficient way as compared to when only colorinformation is used

One possible argument against our usage of only colorinformation however can be that without any gradient ororientation information a saliency model will fail to detectmany salient regions This argument can also supported bynature since neurons tuned to orientation in an image areknown to contribute to saliency computation [53] In ourmodel this issue is addressed by the sparse representationwhere the adaptive basis same as basis shown in Figure 8(a)is Gabor like filters with edge like structure and thus thesebases efficiently capture orientation information from theimage which complements our color information in nonlin-ear representation

Finally the sAUC score of dual representation Table 3shows that we achieve better eye fixation prediction thantreating both representations separately as shown in Tables5 and 6We believe that such improvement is due to the com-plementary behavior of both techniques since a combinedapproach better represents image contents with high fidelityand thus that in turn improves saliency detection Lastly for

The Scientific World Journal 15

illustration and visual comparison we present some saliencymaps produced by our algorithm along with other models inFigure 9

6 Conclusion

This paper shows that dual feature representationmanages torobustly capture image information which can be used in acenter surround operation to compute saliencyWe show thata CSD on adaptive sparse basis gives better results than a fixsparse basis representation In nonlinear representation weshow that nonlinearly integrated color channels with spatialinformation better capture underlying data structure andthus a CSD on such representation gives good results Finallywe consider both representations as complementary and thusa fused saliencymap not only give good results on human eyefixations but also detect salient objects with high accuracy

In future wewill incorporate some top-downmechanismto better imitative human saliency computation capabilitiesbased on learning and experience Another possible exten-sion of the existing work is to test dynamic scenes video byincorporating additional motion information in the currentscheme

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the national Natural Sci-ence Foundation of China (61175096) and Specialized Fundfor Joint Building Program of Beijing Municipal EducationCommission

References

[1] C Siagian and L Itti ldquoBiologically inspired mobile robot visionlocalizationrdquo IEEE Transactions on Robotics vol 25 no 4 pp861ndash873 2009

[2] J Li M D Levine X An X Xu and H He ldquoVisual saliencybased on scale-space analysis in the frequency domainrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 4 pp 996ndash1010 2012

[3] Y Su Q Zhao L Zhao and D Gu ldquoAbrupt motion trackingusing a visual saliency embedded particle filterrdquo Pattern Recog-nition vol 47 no 5 pp 1826ndash1834 2014

[4] C Guo Q Ma and L Zhang ldquoSpatio-temporal saliency detec-tion using phase spectrum of quaternion fourier transformrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo08) pp 1ndash8 June 2008

[5] X Hou and L Zhang ldquoThumbnail generation based on globalsaliencyrdquo in Advances in Cognitive NeurodynamicsmdashICCN2007 pp 999ndash1003 Springer Amsterdam The Netherlands2007

[6] A Borji and L Itti ldquoState-of-the-art in visual attention mod-elingrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 35 no 1 pp 185ndash207 2013

[7] L Itti C Koch and E Niebur ldquoAmodel of saliency-based visualattention for rapid scene analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 20 no 11 pp 1254ndash12591998

[8] AM Treisman ldquoFeature integration theoryrdquoCognitive Psychol-ogy vol 12 no 1 pp 97ndash136 1980

[9] J K Tsotsos S M Culhane W Y Kei Wai Y Lai N Davisand F Nuflo ldquoModeling visual attention via selective tuningrdquoArtificial Intelligence vol 78 no 1-2 pp 507ndash545 1995

[10] R Rae Gestikbasierte Mensch-Maschine-Kommunikation aufder grundlage visueller aufmerksamkeit und adaptivitat [PhDthesis] Universitat Bielefeld 2000

[11] D Parkhurst K Law and E Niebur ldquoModeling the role ofsalience in the allocation of overt visual attentionrdquo VisionResearch vol 42 no 1 pp 107ndash123 2002

[12] J Li Y Tian T Huang and W Gao ldquoProbabilistic multi-tasklearning for visual saliency estimation in videordquo InternationalJournal of Computer Vision vol 90 no 2 pp 150ndash165 2010

[13] MCerf J HarelW Einhauser andCKoch ldquoPredicting humangaze using low-level saliency combined with face detectionrdquo inAdvances in Neural Information Processing Systems vol 20 pp241ndash248 2007

[14] A Torralba ldquoModeling global scene factors in attentionrdquoJournal of the Optical Society of America A Optics and ImageScience and Vision vol 20 no 7 pp 1407ndash1418 2003

[15] A Oliva and A Torralba ldquoModeling the shape of the scenea holistic representation of the spatial enveloperdquo InternationalJournal of Computer Vision vol 42 no 3 pp 145ndash175 2001

[16] N D B Bruce and J K Tsotsos ldquoSaliency attention and visualsearch an information theoretic approachrdquo Journal of Visionvol 9 no 3 article 5 2009

[17] E Erdem and A Erdem ldquoVisual saliency estimation by nonlin-early integrating features using region covariancesrdquo Journal ofVision vol 13 no 4 article 11 2013

[18] J H van Hateren ldquoReal and optimal neural images in earlyvisionrdquo Nature vol 360 no 6399 pp 68ndash70 1992

[19] D J Field ldquoRelations between the statistics of natural imagesand the response properties of cortical cellsrdquo Journal of theOptical Society of America A Optics and Image Science vol 4no 12 pp 2379ndash2394 1987

[20] B A Olshausen and D J Field ldquoNatural image statistics andefficient codingrdquo Network Computation in Neural Systems vol7 no 2 pp 333ndash339 1996

[21] M Kwon G Legge F Fang A Cheong and S He ldquoIdentifyingthe mechanism of adaptation to prolonged contrast reductionrdquoJournal of Vision vol 9 no 8 p 976 2009

[22] X Sun H Yao and R Ji ldquoVisual attention modeling based onshort-term environmental adaptionrdquo Journal of Visual Commu-nication and Image Representation vol 24 no 2 pp 171ndash1802013

[23] A Garcia-Diaz X R Fdez-Vidal X M Pardo and R DosilldquoSaliency from hierarchical adaptation through decorrelationand variance normalizationrdquo Image and Vision Computing vol30 no 1 pp 51ndash64 2012

[24] C E Shannon ldquoAmathematical theory of communicationrdquo BellSystem Technical Journal vol 27 no 3 pp 379ndash423 1948

[25] Y Karklin and M S Lewicki ldquoEmergence of complex cellproperties by learning to generalize in natural scenesrdquo Naturevol 457 no 7225 pp 83ndash86 2009

[26] H J Seo and P Milanfar ldquoStatic and space-time visual saliencydetection by self-resemblancerdquo Journal of Vision vol 9 no 12pp 1ndash27 2009

16 The Scientific World Journal

[27] D J Jobson Z-U Rahman and G AWoodell ldquoProperties andperformance of a centersurround retinexrdquo IEEE Transactionson Image Processing vol 6 no 3 pp 451ndash462 1997

[28] A Guo D Zhao S Liu X Fan and W Gao ldquoVisual attentionbased image quality assessmentrdquo in Proceedings of the 18thIEEE International Conference on Image Processing (ICIP rsquo11) pp3297ndash3300 September 2011

[29] A Borji and L Itti ldquoExploiting local and global patch raritiesfor saliency detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo12) pp 478ndash485 2012

[30] L Duan C Wu J Miao L Qing and Y Fu ldquoVisual saliencydetection by spatially weighted dissimilarityrdquo in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR rsquo11) pp 473ndash480 June 2011

[31] JMWolfe ldquoGuided search 20 a revisedmodel of visual searchrdquoPsychonomic Bulletin amp Review vol 1 no 2 pp 202ndash238 1994

[32] O Le Meur P Le Callet D Barba and DThoreau ldquoA coherentcomputational approach to model bottom-up visual attentionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 28 no 5 pp 802ndash817 2006

[33] A Torralba A Oliva M S Castelhano and J M HendersonldquoContextual guidance of eye movements and attention in real-world scenes the role of global features in object searchrdquoPsychological Review vol 113 no 4 pp 766ndash786 2006

[34] G Kootstra N Bergstrom and D Kragic ldquoUsing symmetry toselect fixation points for segmentationrdquo in Proceedings of the20th International Conference on Pattern Recognition (ICPR rsquo10)pp 3894ndash3897 August 2010

[35] C Li J Xue N Zheng X Lan and Z Tian ldquoSpatio-temporalsaliency perception via hypercomplex frequency spectral con-trastrdquo Sensors vol 13 no 3 pp 3409ndash3431 2013

[36] L Itti and P Baldi ldquoBayesian surprise attracts human attentionrdquoVision Research vol 49 no 10 pp 1295ndash1306 2009

[37] J Harel C Koch and P Perona ldquoGraph-based visual saliencyrdquoin Advances in Neural Information Processing Systems vol 19pp 545ndash552 2006

[38] X Hou and L Zhang ldquoSaliency detection a spectral residualapproachrdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo07)pp 1ndash8 June 2007

[39] B Schauerte and R Stiefelhagen ldquoQuaternion-based spectralsaliency detection for eye fixation predictionrdquo in ComputerVisionmdashECCV 2012 A Fitzgibbon S Lazebnik P Perona YSato and C Schmid Eds Lecture Notes in Computer Sciencepp 116ndash129 2012

[40] W Kienzle F Wichmann B Scholkopf and M Franz ANonparametric Approach to Bottom-Up Visual Saliency 2007

[41] J Tilke K Ehinger F Durand and A Torralba ldquoLearningto predict where humans lookrdquo in Proceedings of the 12thInternational Conference on Computer Vision (ICCV rsquo09) pp2106ndash2113 October 2009

[42] W Wang Y Wang Q Huang and W Gao ldquoMeasuring visualsaliency by site entropy raterdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 2368ndash2375 June 2010

[43] O Tuzel F Porikli and P Meer ldquoRegion covariance afast descriptor for detection and classificationrdquo in ComputerVisionmdashECCV 2006 vol 3952 of Lecture Notes in ComputerScience pp 589ndash600 2006

[44] httpssitesgooglecomsitesparsenonlinearsaliencymodelhomedownloads

[45] A Hyvarinen and E Oja ldquoA fast fixed-point algorithm forindependent component analysisrdquo Neural Computation vol 9no 7 pp 1483ndash1492 1997

[46] X Hou and L Zhang ldquoDynamic visual attention searching forcoding length incrementsrdquo in Advances in Neural InformationProcessing Systems vol 21 pp 681ndash688 2008

[47] L Zhang M H Tong T K Marks H Shan and G WCottrell ldquoSUN a Bayesian framework for saliency using naturalstatisticsrdquo Journal of Vision vol 8 no 7 article 32 pp 1ndash202008

[48] J Zhang and S Sclaroff ldquoSaliency detection a boolean mapapproachrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV rsquo13) 2013

[49] X Hou J Harel and C Koch ldquoImage signature highlightingsparse salient regionsrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 34 no 1 pp 194ndash201 2012

[50] E Rahtu J Kannala M Salo and J Heikkila ldquoSegmentingsalient objects from images and videosrdquo in Computer VisionmdashECCV 2010 vol 6315 of Lecture Notes in Computer Science pp366ndash379 2010

[51] M-M Cheng G-X Zhang N J Mitra X Huang and S-M Hu ldquoGlobal contrast based salient region detectionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo11) pp 409ndash416 June 2011

[52] T Veit J-P Tarel P Nicolle and P Charbonnier ldquoEvaluationof road marking feature extractionrdquo in Proceedings of the11th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo08) pp 174ndash181 December 2008

[53] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model of simple receptive fields in catstriate cortexrdquo Journal of Neurophysiology vol 58 no 6 pp1233ndash1258 1987

Page 5: Saliency Detection Using Sparse and Nonlinear …...Saliency Detection Using Sparse and Nonlinear Feature Representation ShahzadAnwar,1 QingjieZhao,1 MuhammadFarhanManzoor,2 andSaqibIshaqKhan3

The Scientific World Journal 5

Adaptive sparserepresentation

Center surroundsaliency

computation

Center surroundsaliency

computation

Image to patchesconversion

Saliency map

Color covariancenon-linear

representation

Image to patchesconversion

Π

Figure 2 Proposed model for saliency computation

Center surroundsaliency

computation

Image patches Sparse co-efficientvectors

Adaptive basis

p998400i p

998400i+1 p998400

i+2

Figure 3 Saliency computation in adaptive sparse representation

where a patch 119901119896consists of 119898 pixels with mean 120583

119896 Now

first order statistics is incorporated in the covariancematricesusing the method mentioned in [17] Then the new repre-sentation of covariance matrices with first order statisticsembedded for a patch is given by

120595119896

= 120595 (119862 (119901119896)) (7)

where function 120595 embeds first order statistics in an inputmatrix The final nonlinear feature representation of image

with 120595119896represents a 119896th patch and 119899 being total number of

patches is given by

11986810158401015840

=

119899

K119896=1

120595119896 (8)

where also K function arranges patches at respective posi-tions to form an image The whole representation is given inFigure 4Saliency Computation The saliency is computed by CSDoperation and then extended to multiple scales The CSDoperation is shown in Figure 1 where a patch under consider-ation is in red rectangle and surrounding area is highlighted

6 The Scientific World Journal

Input

RG

B

Output

Image to patchesconversion and

covariancerepresentation

First orderstatistics

embeddedPosition

information

I(x y)

Feature matrixF = [R G B x y]

Figure 4 Nonlinear representation of an input image

in yellow rectangle The saliency of red patch 119901119894 is given

by its dissimilarity between 119908 surrounding patches (yellowrectangle) as

119878 (119901119894) =

1

119908

119908

sum

119895=1

119889 (119901119894 119901119895) (9)

where dissimilarity 119889(119901119894 119901119895) between two patches is given by

119889 (119901119894 119901119895) =

10038171003817100381710038171003817120572 (119901119894) minus 120572 (119901

119895)10038171003817100381710038171003817

1 +10038171003817100381710038171003817119883119894minus 119883119895

10038171003817100381710038171003817

(10)

where119883119894and119883

119895are the central position of the patches 119901

119894and

119901119895 For the case of sparse representation we have

120572 (119901119894) = 119901

1015840

119894(11)

and for nonlinear representation we have

120572 (119901119894) = 120595119894= 120595 (119862 (119901

119894)) (12)

Thus the saliencymap for patch 119901119894derived from 119868

1015840 and 11986810158401015840 can

be given as

1198781198681015840 (119901119894) =

1

119908

119908

sum

119895=1

100381710038171003817100381710038171199011015840

119894minus 1199011015840

119895

10038171003817100381710038171003817

1 +10038171003817100381710038171003817119883119894minus 119883119895

10038171003817100381710038171003817

(13)

and with 120595(119862(119901119894)) being in vector form

11987811986810158401015840 (119901119894) =

1

119908

119908

sum

119895=1

10038171003817100381710038171003817120595 (119862 (119901

119894)) minus 120595 (119862 (119901

119895))

10038171003817100381710038171003817

1 +10038171003817100381710038171003817119883119894minus 119883119895

10038171003817100381710038171003817

(14)

The multiscale saliency by sparse approach is given by

1198781198681015840 (119909) = N(prod

1198971015840120598119871

1198781198971015840

1198681015840 (119909)) (15)

and for nonlinear integrated approach is given by

11987811986810158401015840 (119909) = N(prod

119897120598119871

119878119897

11986810158401015840 (119909)) (16)

where 1198971015840 and 119897 represent the number of scales and N shows

normalization Finally saliency map becomes

119878119868

= 119866120590

⊛ (N (1198781198681015840 lowast 11987811986810158401015840)) (17)

where119866120590represents the Gaussian smoothing by convolution

⊛ operation and lowast stands for multiplication operation

4 Experimentation

In this section we thoroughly evaluate the proposed modelwith three different experiments human eye fixation pre-diction salient-object detection and response to variouspsychological patterns The human eye fixation prediction isthe basic and necessary test to check the performance of asaliency map against the collected eye fixation from severalhuman subjects

How well a saliency map distinguishes and highlights anobject in an image shows its ability of salient object detectionThe salient object detection capability of a model is evaluatedby employing some metrics that compare the generatedsaliency map against the ground truth made by manuallabeling of the salient region in an image by human subjectsThe psychological patterns give a qualitative analysis of thesaliency model These patterns are designed to check pop-upresponses in different scenarios like orientation conjunctionand color and so forth Code (Matlab P-code) of the proposedmodel used for experimentation is available online [44]

41 Parameter Setting Before pursuing the evaluation of theproposed model we fix the parameters used to generatethe saliency maps by our model These parameters willremain the same for all the experiments Derivation ofthese parameters will be discussed in the next section afterthe introduction of the datasets and the metric used forevaluation

Sparse Representation We resize all input images to 80 times

60 pixels and use only single scale 1198971015840

= 1 for saliencycomputation Patches of 5 times 5 pixels [22] are generated withsliding overlapping window from every input image to learnthe basis for the dictionary and for the saliency computation

The Scientific World Journal 7

Table 1 Following are the input image resolution used in eachalgorithm with default parameter settings Some algorithms (atleastlowast) do internally reduce dimensions for fast computation andoptimal performance

Model Acronym ResolutionAIM [16] 12(W times H)lowastGBVS [37] W times HMESR [39] 64 times 48lowastICL [46] W times HlowastItti [37] W times HMPQFT [39] 64 times 48AWS [23] W times HlowastSDSR [26] W times HSUN [47] 12(W times H)HFT [2] 128 times 128LG [29] 256 times 256ERDM [17] 512 times 512ΔQDCT [39] 64 times48lowastProposed W times H

An ICA package available online FAST ICA [45] is used forthis experimentation

Nonlinear Representation For nonlinear image representa-tion RGB color and position information is used in onlineavailable implementation of E Erdem andA Erdem [17]Thesaliency is computed with the default parameters used in [17]that have every input image being resized to 512 times 512 pixelsand five different patch sizes 119901

119894= 8 16 32 64 128 and thus

119897 = 5 are used for saliency computationFinally normalized sparse representationrsquos saliency map

is rescaled to the size of nonlinear representationrsquos saliencymap and both maps are multiplied and normalizedThen thefinal saliencymap is resized to the actual input image size andused for experimentation The input image resolutions usedin all the saliency algorithms for experimentation are givenin Table 1

42 Human Eye Fixation Prediction In order to validatethe proposed model with human eye fixation predictionssaliency maps are generated on three datasets and for a faircomparison shuffle area under the curve score(sAUC) is usedto quantify the results

Dataset A reasonable dataset that can be used for evaluationof human eye fixation prediction must be complex anddiverse enough so that performance can be thoroughly eval-uated In literature Toronto [16] and Kootstra [34] datasetsare the most popular and widely used datasets IMSAL [2] isa relatively new dataset which we also used in our evaluation

Toronto dataset was prepared by Bruce and Tsotsos [16]and it consists of 120 images each with 681 times 511 pixels Thisdataset has both indoor and outdoor imagesThe eye fixationground truth is based on 20 subjects who free viewed theimages for few seconds

Kootstra dataset was used in [34] It consists of 101 imageseach with a resolution of 1024 times 768 pixels These imagesconsist of flowers natural scenes automans and buildingsThis dataset is significantly complex because it has manyimages with no explicit salient regions The eye fixationground truth available with this dataset is based on freeviewing of 31 subjects for a few seconds

IMSAL dataset is given by Li et al [2] it consists of 235images which are collected online through an internet searchengine and some images were taken from literature Theseimages are divided into six categories with 50 images havinglarge salient regions 80 images with intermediate salientregions 60 images with small salient regions 15 imageswith cluttered backgrounds and 15 images with repeatingdistracters These images give a good benchmark for per-formance evaluation because of the significant complexitygiven by variable size of salient objects objects with clutterand objects with distracters The accompanied ground truthconsists of both eye fixation information and binary maskscreated by human subjects who manually marked the salientobject in an image

Metric for Evaluation The most popular method to evaluatethe performance of a saliency map is to calculate area underthe curve (AUC) score of receiver operating characteristics(ROC) curve At first a saliency map is thresholded andused as a binary classifier with human eye fixations actingas positive set and some other points uniformly random asnegative set to plot an ROC curve The AUC of that ROC iscalculated andused as ameasure of performance comparison

There are various variants of AUC available in literatureand the basic difference between them is the choice ofnegative set points We will use the shuffled area under thecurve sAUC score because of its ability to cater for centerbias [41] since some models implicitly incorporate centerbias which makes a fair comparison difficult to performit is becoming standard to present results with sAUC InsAUC score the positive points consists of human subjectseye fixation on that image and the negative set consists ofall the fixation of subjects on the rest of the dataset imagesThe sAUC gives a 05 score on a center Gaussian blob whichis about the same as a random or chance score whereas allthe other versions of AUC [6] give very high score becausethey are affected by the center bias For our experimentationwe used sAUC available online by Schauerte and Stiefelhagen[39] We calculate every sAUC score for 20 times [47]and then use the mean value We found that the standarddeviation of the sAUC approximately ranges from 1119864 minus 4 to5119864 minus 4 in our experiments

Performance Analysis with Resolution In order to find theoptimal parameters for the proposed model we treat bothrepresentations separately and find the best parameters foreach representation that can be incorporated in the proposedmodel We plotted both sparse and nonlinear representationwith variable resolution on all three datasets and measurethe sAUC score Using various parameters given in Table 2Figure 5 is plotted which gives the performance of bothrepresentations on the three datasets Figure 5 shows that for

8 The Scientific World Journal

Table 2 Various scales parameters which are used for evaluation of sparse and nonlinear representation

Scale Adaptive sparse representation Nonlinear representationResolution Patch size Resolution Patch size

1 40 times 30 5 128 times 128 2 4 8 16 32

2 80 times 60 5 256 times 256 4 8 16 32 64

3 160 times 120 5 512 times 512 8 16 32 64 128

08

07

06

1 2 3

Scale

sAU

C

Sparse representation

BruceKootstra

IMSAL

(a)

Non-linear representation

075

070

065

0601 2 3

Scale

sAU

C

BruceKootstra

IMSAL

(b)

Figure 5 sAUC score for sparse and nonlinear representation The values in (a) and (b) are maximum at scale 2 and scale 3 for all threedatasets

sparse representation the performance is maximum with 80times 60 pixels (scale 2) and for nonlinear representation we getgood performance at 512 times 512 pixels (scale 3) Usually theresolution of the input images is not very high so we restrictto 512 times 512 as upper bound for nonlinear representation andsame resolution with respective patch sizes is used in [17]The image resolution and patch sizes of different scales forthe both models used for evaluation are given in Table 2Based on this analysis we incorporate the parameters of scale2 and scale 3 for sparse and nonlinear representation in theproposed saliency model

Performance Comparison with Other Models The results ofour model along with comparison with 13 state-of-the-artmethods are given in Table 3 The detailed performancewith variable Gaussian smoothing is given in Figure 6 Thesimulation codes for all these methods are taken from theauthors websites We used multiscale and quaternion basedimplementation for spectral residue (MESR) PQFT [4] andDCT [49] as proposed by Schauerte and Stiefelhagen [39]which gives higher score than original methods Erdemrsquos [17]implementation with first order statistics embedded is usedfor simulation since it gives higher sAUC score Results withthe proposed technique are quite consistent and the proposedmethod outperforms the state-of-the-artΔQDCT [39]model

on the Toronto dataset and performs comparatively well onthe Kootstra dataset and IMSAL dataset No single modelperforms well on all these datasets and the performanceof other models changes with the dataset but our modelshows consistency and remained either ranked 1 or ranked2 on these datasets We believe that the high eye fixationprediction is due to the adaptive nature of the model anddue to dual representation of features The adaptivity makesa feature space optimal for the current image thus a moreaccurate representation of the features is possible which inturn accounts for better saliency map estimation Moreovera single representation may not be enough for every caseFinally these results can improve further if we use a multi-scale representation for ICA based representation which weskipped due to computational time constraints

43 Salient Object Detection A saliency map can be usedto detect a salient object in an image The basic premiseis that if an image consists of an object which stands outfrom the rest of the image then it should be identifiedby a saliency algorithm There is a different branch invisual saliency modeling which consists of models that arespecifically designed to detect salient objects These modelsfind the salient object in an image and then segment thewhole

The Scientific World Journal 9

000 002 004 006 008

120590

072

070

068

066

064

062

060

Toronto dataset

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFTLGERDEMQDCTAWSProposed

sAU

C

010 012 014

(a)

062

060

058

056

054

052

Kootstra dataset

000 002 004 006 008

120590

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFTLGERDEMQDCTAWSProposed

sAU

C

010 012 014

(b)

078

076

074

072

070

068

066

IMSAL dataset

000 002 004 006 008

120590

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFT

ERDEMQDCTAWSProposed

sAU

C

LG

010 012 014 016

(c)

Figure 6 Toronto Kootstra and IMSAL datasets with variable Gaussian smoothing for all algorithms in comparison The 119909-axis representsthe 120590 of the smoothing Gaussian (in image width) (In (c) only GBVS LG and ΔQDCT are taken from [48])

10 The Scientific World Journal

Table3Com

paris

onof

sAUCscoreo

fthe

prop

osed

mod

elwith

13otherstate-of-the-artmod

els

Dataset

AIM

[16]

GBV

S[37]

MES

R[39]

ICL[46]

Itti[37]

MPQ

FT[39]

SDSR

[26]

SUN[47]

HFT

[2]

LG[29]

ERDM

[17]

ΔQDCT

[39]

AWS[23]

Prop

osed

TORO

NTO

[16]

0699

064

0407141

064

76064

8107119

07062

066

6106915

06939

07023

07172

07130

07179

Optim

al120590

004

001

005

003

001

004

003

003

006

004

004

005

001

004

KOOTS

TRA[34]

05901

05572

05978

05769

05689

05985

05984

05613

05911

05938

0603

05998

06185

06157

Optim

al120590

003

001

004

001

004

005

004

002

005

001

003

003

001

003

IMSA

L[2]

07424

lowast076

6507153

07364

07512

071694

07403

06955

07419lowast07356

07455

lowast07434

07468

07560

Optim

al120590

008

007

015

013

009

014

013

016

009

012

012

010

005

009

Thetop

rank

edmod

elisin

boldfont

andsecond

rank

edmod

elisin

italic

fontH

erer

esultswith

optim

alaverages

AUCof

each

metho

dwith

thec

orrespon

ding

Gaussianblur(120590)are

repo

rtedA

sin[47]w

erepeat

thes

AUCcalculationfor2

0tim

esandcompu

tethes

tand

arddeviationof

each

averages

AUC

which

ranges

from1119864minus4to5119864minus4(lowastvalues

takenfro

m[48])

The Scientific World Journal 11

Table4Com

paris

onof

AUCandPo

DSC

scoreo

fthe

prop

osed

mod

elwith

otherstate-of-the-artmod

els

Algorith

mC1

C2C3

C4C5

C6Av

gAU

CPo

DSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AvgAU

CAIM

0921

0653

0918

0543

0957

0461

0904

0435

0924

0508

0941

0671

0927

MES

R0798

0494

0854

0436

0927

0377

0718

0238

0837

0398

0918

0617

0842

ICL

0928

0681

0907

0501

0909

0348

0926

0537

0902

0498

0914

0569

0914

Itti

0939

0687

0924

0542

0952

0457

0882

0423

0924

0499

0941

0631

0927

MPQ

FT0805

0497

0865

0452

0927

0386

0735

0248

0852

0436

0921

0630

0851

SDSR

0879

0618

0908

0511

0948

0435

0801

0298

0904

0489

0933

064

50896

SUN

0766

0459

0787

0349

0880

0362

0708

0263

0719

0283

0817

0451

0780

HFT

0937

070

00923

0552

0937

044

8096

4064

80933

0554

096

0070

60942

LG0814

0515

0859

046

40904

0379

0635

0145

0862

0499

0884

0572

0826

ERDEM

0882

0617

0912

0555

0962

0526

0840

0349

0923

0575

0916

0635

0906

MQDCT

0840

0564

0894

0517

0947

0489

0806

0311

0888

0524

0923

0670

0883

AWS

0894

0625

0922

0569

0957

0480

0891

0402

0920

0534

0925

0655

0918

Prop

osed

0920

0653

094

3059

60976

0555

0934

0516

094

20532

0956

0688

094

5Th

etop

rank

edmod

elisin

bold

font

andthe2

ndrank

edmod

elisin

italic

fontTh

eoverallbestaverageA

UCscoreisg

iven

byou

rpropo

sedmod

el

12 The Scientific World Journal

Input Proposed AIM GBVS MESR ICL Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 7 Response of our algorithm and other 13 state-of-the-art algorithms on various psychological patterns

extent of the object and thus solve this task like segmentation-type binary labeling formulation [50 51] In contrast ourmodel is designed for location based (eye fixation) saliencymodeling and it is not designed to capture exact objectboundaries however by thresholding a saliency map we canget a binary map that can be used for testing performance ofa model for salient object detection Since our saliencymodelis location based we only use other location based modelsin comparison for a fair evaluation similar convention isfollowed in [2 17]

Dataset and Metric for Evaluation For salient object detec-tion the metric used by Li et al [2] is area under thecurve (AUC) and dice similarity coefficient (DSC) on IMSAL[2] dataset We will use the same metric and dataset forsalient object detectionThe DSC gives the overlap between athreshold saliency map and the ground truth Moreover thepeak value of the DSC [52] is considered an important way toestablish the best performance of an algorithm at an optimalthreshold thus we also give results with peak value of theDSC curve (PoDSC) Since AUC can be influenced by centerbias for fair comparison we turn off the center bias in allthe algorithmsThe GBVS [37] has built-in center bias and inorder to cater for that the author in [2] incorporates explicitcenter bias and shows that HFT [2] performs better thanGBVS on the same dataset which is also used in our paperWe do not employ center bias explicitly or implicitly in thepresented results and addedHFT instead for our comparisonTherefore GVBS [37] is skipped from Table 4 Furthermorewe perform Gaussian smoothing in all the algorithms tofind the optimal smoothing parameters for each class in thedataset and the optimal performance is quoted in the resultsgiven in Table 4

Performance We present results in comparison with other 12state-of-the-art algorithms The complete results are given in

Table 4 Our proposed scheme gives the best performanceon three categories C2 C3 and C5 and ranked secondon C4 and C6 Our model gives the highest average AUCscore on this dataset On different categories our results arecomparative to the HFT which is state of the art on thisdataset Apart fromHFT the performance is also significantlybetter in comparison to other algorithmsThedataset used forcomparison is quite complex but our algorithm performedwell for intermediate small objects with distracters althoughperformance on other cases is little less than other state-of-the-art algorithms

44 Psychological Patterns Wealso tested our saliencymodelon psychological patterns which are commonly used to givea qualitative performance on artificial scenarios which simu-late a pop-up phenomenonThese patterns simulate the pop-out phenomenon based on color intersection symmetryorientation curvature and candles image In order to checkthe general performance on various psychological tasks wetested the proposed model on eight psychological patternsFigure 7 gives the results of the proposed saliency modelalong with other popular models The proposed algorithmworks well on color symmetry and orientation as well as oncandle image but the performance is not good for curvatureand intersection patterns Figure 7 also shows that any singlemodel does not give good performance on all the patterns andthe best performance is more or less the same as given by theproposed scheme

5 Discussion

The image features representation drastically affects theinformation content and thus saliency estimation We useda bioinspired center surround saliency computation on twoparallel feature representations that give good performance

The Scientific World Journal 13

(a)

(b) (c)

(d) (e)

Figure 8 The fixed learned dictionary and the information lost by using such dictionary (a) Fixed learned dictionary ((b)-(c)) The inputimages ((d)-(e)) The information lost by using dictionary given in (a)

on both eye fixation prediction and salient object detectionSince a CSD operation depends on the difference between acenter portion and its surroundings a better image contentsrepresentation makes a more accurate and precise centersurround operation

We used an adaptive sparse representation to boost theperformance of the CSD operation In order to show theeffectiveness of the proposed approach we present bothquantitative and qualitative results For qualitative compar-ison we use a fix dictionary [29] (Figure 8(a)) learnt from

an ensemble of natural images We show that some of theinformation is lost if we use a fixed dictionary to represent animage and usually the lost information belongs to the salientregion in an image The difference between an input imageand the image reconstructed by a fixed dictionary is givenin Figure 8 The red cylinder Figure 8(b) and red box withtext Figure 8(c) is visible in the image plots Figures 8(d)and 8(e) These two objects are the salient features in bothimages and their appearance in the residual image shows thatcurrent representation which uses a fixed dictionary loses

14 The Scientific World Journal

Input Gnd truth Proposed AIM GBVS Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 9 Comparison of our method with other state-of-the-art saliency models The first column is the input image with second columnbeing ground truth followed by our proposed saliency model Results from all the other models are plotted in the same row

Table 5 sAUC score comparison using fixed [29] and adaptivesparse representation (proposed) for CSD saliency

DatasetBorji local [29] Fixeddictionary sparserepresentation

Proposed Adaptivedictionary sparserepresentation

Toronto 0670 07110Kootstra 0572 06061

Table 6 sAUC comparison of E Erdem and A Erdem [17] and theproposed color based nonlinear integration

Dataset E Erdem andA Erdem [17]

Proposed Nonlinearrepresentation

Bruce and Tsotsos [16] 07023 07084Kootstra et al [34] 0603 06152

some information that belong to salient portion of the inputimage

For quantitative comparison we employ sAUC to dothe comparison between saliency maps based on a fixeddictionary and the adaptive basis dictionary Table 5 gives acomparison of the both approaches on two datasets Torontoand Kootstra The performance difference is quite significantand it shows that an adaptive representation is much betterBased on these qualitative and quantitative results we canconclude that an adaptive image presentation is more viableand accurate for CSD based saliency computation

A model proposed in [17] gives an idea of nonlinearintegration of all features by covariance matrices and thesupported implementation uses color gradient and spatialinformation Our second contribution is the modification ofthat model and a proposal of using only color with spatial

information in nonlinearly integrated manner using samecovariance matrices For our proposed implementation(seeFigure 4) we modified the features used in [17] and comparethe saliencymap with Erdemrsquos [17] model in Table 6 using thesAUC score Two databases Toronto [16] and Kootstra [34]are used for simulations and the results indicate that by usingonly color with spatial information we can get better sAUCscore than integrating all features using covariance matricesIn Table 6 the difference in sAUC score is quite visible onboth datasets One possible reason of this improvement maybe that the correlation among different features like color andorientation is different and thus using a covariance basedrepresentation does not capture the underlying informationstructure in an efficient way as compared to when only colorinformation is used

One possible argument against our usage of only colorinformation however can be that without any gradient ororientation information a saliency model will fail to detectmany salient regions This argument can also supported bynature since neurons tuned to orientation in an image areknown to contribute to saliency computation [53] In ourmodel this issue is addressed by the sparse representationwhere the adaptive basis same as basis shown in Figure 8(a)is Gabor like filters with edge like structure and thus thesebases efficiently capture orientation information from theimage which complements our color information in nonlin-ear representation

Finally the sAUC score of dual representation Table 3shows that we achieve better eye fixation prediction thantreating both representations separately as shown in Tables5 and 6We believe that such improvement is due to the com-plementary behavior of both techniques since a combinedapproach better represents image contents with high fidelityand thus that in turn improves saliency detection Lastly for

The Scientific World Journal 15

illustration and visual comparison we present some saliencymaps produced by our algorithm along with other models inFigure 9

6 Conclusion

This paper shows that dual feature representationmanages torobustly capture image information which can be used in acenter surround operation to compute saliencyWe show thata CSD on adaptive sparse basis gives better results than a fixsparse basis representation In nonlinear representation weshow that nonlinearly integrated color channels with spatialinformation better capture underlying data structure andthus a CSD on such representation gives good results Finallywe consider both representations as complementary and thusa fused saliencymap not only give good results on human eyefixations but also detect salient objects with high accuracy

In future wewill incorporate some top-downmechanismto better imitative human saliency computation capabilitiesbased on learning and experience Another possible exten-sion of the existing work is to test dynamic scenes video byincorporating additional motion information in the currentscheme

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the national Natural Sci-ence Foundation of China (61175096) and Specialized Fundfor Joint Building Program of Beijing Municipal EducationCommission

References

[1] C Siagian and L Itti ldquoBiologically inspired mobile robot visionlocalizationrdquo IEEE Transactions on Robotics vol 25 no 4 pp861ndash873 2009

[2] J Li M D Levine X An X Xu and H He ldquoVisual saliencybased on scale-space analysis in the frequency domainrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 4 pp 996ndash1010 2012

[3] Y Su Q Zhao L Zhao and D Gu ldquoAbrupt motion trackingusing a visual saliency embedded particle filterrdquo Pattern Recog-nition vol 47 no 5 pp 1826ndash1834 2014

[4] C Guo Q Ma and L Zhang ldquoSpatio-temporal saliency detec-tion using phase spectrum of quaternion fourier transformrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo08) pp 1ndash8 June 2008

[5] X Hou and L Zhang ldquoThumbnail generation based on globalsaliencyrdquo in Advances in Cognitive NeurodynamicsmdashICCN2007 pp 999ndash1003 Springer Amsterdam The Netherlands2007

[6] A Borji and L Itti ldquoState-of-the-art in visual attention mod-elingrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 35 no 1 pp 185ndash207 2013

[7] L Itti C Koch and E Niebur ldquoAmodel of saliency-based visualattention for rapid scene analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 20 no 11 pp 1254ndash12591998

[8] AM Treisman ldquoFeature integration theoryrdquoCognitive Psychol-ogy vol 12 no 1 pp 97ndash136 1980

[9] J K Tsotsos S M Culhane W Y Kei Wai Y Lai N Davisand F Nuflo ldquoModeling visual attention via selective tuningrdquoArtificial Intelligence vol 78 no 1-2 pp 507ndash545 1995

[10] R Rae Gestikbasierte Mensch-Maschine-Kommunikation aufder grundlage visueller aufmerksamkeit und adaptivitat [PhDthesis] Universitat Bielefeld 2000

[11] D Parkhurst K Law and E Niebur ldquoModeling the role ofsalience in the allocation of overt visual attentionrdquo VisionResearch vol 42 no 1 pp 107ndash123 2002

[12] J Li Y Tian T Huang and W Gao ldquoProbabilistic multi-tasklearning for visual saliency estimation in videordquo InternationalJournal of Computer Vision vol 90 no 2 pp 150ndash165 2010

[13] MCerf J HarelW Einhauser andCKoch ldquoPredicting humangaze using low-level saliency combined with face detectionrdquo inAdvances in Neural Information Processing Systems vol 20 pp241ndash248 2007

[14] A Torralba ldquoModeling global scene factors in attentionrdquoJournal of the Optical Society of America A Optics and ImageScience and Vision vol 20 no 7 pp 1407ndash1418 2003

[15] A Oliva and A Torralba ldquoModeling the shape of the scenea holistic representation of the spatial enveloperdquo InternationalJournal of Computer Vision vol 42 no 3 pp 145ndash175 2001

[16] N D B Bruce and J K Tsotsos ldquoSaliency attention and visualsearch an information theoretic approachrdquo Journal of Visionvol 9 no 3 article 5 2009

[17] E Erdem and A Erdem ldquoVisual saliency estimation by nonlin-early integrating features using region covariancesrdquo Journal ofVision vol 13 no 4 article 11 2013

[18] J H van Hateren ldquoReal and optimal neural images in earlyvisionrdquo Nature vol 360 no 6399 pp 68ndash70 1992

[19] D J Field ldquoRelations between the statistics of natural imagesand the response properties of cortical cellsrdquo Journal of theOptical Society of America A Optics and Image Science vol 4no 12 pp 2379ndash2394 1987

[20] B A Olshausen and D J Field ldquoNatural image statistics andefficient codingrdquo Network Computation in Neural Systems vol7 no 2 pp 333ndash339 1996

[21] M Kwon G Legge F Fang A Cheong and S He ldquoIdentifyingthe mechanism of adaptation to prolonged contrast reductionrdquoJournal of Vision vol 9 no 8 p 976 2009

[22] X Sun H Yao and R Ji ldquoVisual attention modeling based onshort-term environmental adaptionrdquo Journal of Visual Commu-nication and Image Representation vol 24 no 2 pp 171ndash1802013

[23] A Garcia-Diaz X R Fdez-Vidal X M Pardo and R DosilldquoSaliency from hierarchical adaptation through decorrelationand variance normalizationrdquo Image and Vision Computing vol30 no 1 pp 51ndash64 2012

[24] C E Shannon ldquoAmathematical theory of communicationrdquo BellSystem Technical Journal vol 27 no 3 pp 379ndash423 1948

[25] Y Karklin and M S Lewicki ldquoEmergence of complex cellproperties by learning to generalize in natural scenesrdquo Naturevol 457 no 7225 pp 83ndash86 2009

[26] H J Seo and P Milanfar ldquoStatic and space-time visual saliencydetection by self-resemblancerdquo Journal of Vision vol 9 no 12pp 1ndash27 2009

16 The Scientific World Journal

[27] D J Jobson Z-U Rahman and G AWoodell ldquoProperties andperformance of a centersurround retinexrdquo IEEE Transactionson Image Processing vol 6 no 3 pp 451ndash462 1997

[28] A Guo D Zhao S Liu X Fan and W Gao ldquoVisual attentionbased image quality assessmentrdquo in Proceedings of the 18thIEEE International Conference on Image Processing (ICIP rsquo11) pp3297ndash3300 September 2011

[29] A Borji and L Itti ldquoExploiting local and global patch raritiesfor saliency detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo12) pp 478ndash485 2012

[30] L Duan C Wu J Miao L Qing and Y Fu ldquoVisual saliencydetection by spatially weighted dissimilarityrdquo in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR rsquo11) pp 473ndash480 June 2011

[31] JMWolfe ldquoGuided search 20 a revisedmodel of visual searchrdquoPsychonomic Bulletin amp Review vol 1 no 2 pp 202ndash238 1994

[32] O Le Meur P Le Callet D Barba and DThoreau ldquoA coherentcomputational approach to model bottom-up visual attentionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 28 no 5 pp 802ndash817 2006

[33] A Torralba A Oliva M S Castelhano and J M HendersonldquoContextual guidance of eye movements and attention in real-world scenes the role of global features in object searchrdquoPsychological Review vol 113 no 4 pp 766ndash786 2006

[34] G Kootstra N Bergstrom and D Kragic ldquoUsing symmetry toselect fixation points for segmentationrdquo in Proceedings of the20th International Conference on Pattern Recognition (ICPR rsquo10)pp 3894ndash3897 August 2010

[35] C Li J Xue N Zheng X Lan and Z Tian ldquoSpatio-temporalsaliency perception via hypercomplex frequency spectral con-trastrdquo Sensors vol 13 no 3 pp 3409ndash3431 2013

[36] L Itti and P Baldi ldquoBayesian surprise attracts human attentionrdquoVision Research vol 49 no 10 pp 1295ndash1306 2009

[37] J Harel C Koch and P Perona ldquoGraph-based visual saliencyrdquoin Advances in Neural Information Processing Systems vol 19pp 545ndash552 2006

[38] X Hou and L Zhang ldquoSaliency detection a spectral residualapproachrdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo07)pp 1ndash8 June 2007

[39] B Schauerte and R Stiefelhagen ldquoQuaternion-based spectralsaliency detection for eye fixation predictionrdquo in ComputerVisionmdashECCV 2012 A Fitzgibbon S Lazebnik P Perona YSato and C Schmid Eds Lecture Notes in Computer Sciencepp 116ndash129 2012

[40] W Kienzle F Wichmann B Scholkopf and M Franz ANonparametric Approach to Bottom-Up Visual Saliency 2007

[41] J Tilke K Ehinger F Durand and A Torralba ldquoLearningto predict where humans lookrdquo in Proceedings of the 12thInternational Conference on Computer Vision (ICCV rsquo09) pp2106ndash2113 October 2009

[42] W Wang Y Wang Q Huang and W Gao ldquoMeasuring visualsaliency by site entropy raterdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 2368ndash2375 June 2010

[43] O Tuzel F Porikli and P Meer ldquoRegion covariance afast descriptor for detection and classificationrdquo in ComputerVisionmdashECCV 2006 vol 3952 of Lecture Notes in ComputerScience pp 589ndash600 2006

[44] httpssitesgooglecomsitesparsenonlinearsaliencymodelhomedownloads

[45] A Hyvarinen and E Oja ldquoA fast fixed-point algorithm forindependent component analysisrdquo Neural Computation vol 9no 7 pp 1483ndash1492 1997

[46] X Hou and L Zhang ldquoDynamic visual attention searching forcoding length incrementsrdquo in Advances in Neural InformationProcessing Systems vol 21 pp 681ndash688 2008

[47] L Zhang M H Tong T K Marks H Shan and G WCottrell ldquoSUN a Bayesian framework for saliency using naturalstatisticsrdquo Journal of Vision vol 8 no 7 article 32 pp 1ndash202008

[48] J Zhang and S Sclaroff ldquoSaliency detection a boolean mapapproachrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV rsquo13) 2013

[49] X Hou J Harel and C Koch ldquoImage signature highlightingsparse salient regionsrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 34 no 1 pp 194ndash201 2012

[50] E Rahtu J Kannala M Salo and J Heikkila ldquoSegmentingsalient objects from images and videosrdquo in Computer VisionmdashECCV 2010 vol 6315 of Lecture Notes in Computer Science pp366ndash379 2010

[51] M-M Cheng G-X Zhang N J Mitra X Huang and S-M Hu ldquoGlobal contrast based salient region detectionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo11) pp 409ndash416 June 2011

[52] T Veit J-P Tarel P Nicolle and P Charbonnier ldquoEvaluationof road marking feature extractionrdquo in Proceedings of the11th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo08) pp 174ndash181 December 2008

[53] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model of simple receptive fields in catstriate cortexrdquo Journal of Neurophysiology vol 58 no 6 pp1233ndash1258 1987

Page 6: Saliency Detection Using Sparse and Nonlinear …...Saliency Detection Using Sparse and Nonlinear Feature Representation ShahzadAnwar,1 QingjieZhao,1 MuhammadFarhanManzoor,2 andSaqibIshaqKhan3

6 The Scientific World Journal

Input

RG

B

Output

Image to patchesconversion and

covariancerepresentation

First orderstatistics

embeddedPosition

information

I(x y)

Feature matrixF = [R G B x y]

Figure 4 Nonlinear representation of an input image

in yellow rectangle The saliency of red patch 119901119894 is given

by its dissimilarity between 119908 surrounding patches (yellowrectangle) as

119878 (119901119894) =

1

119908

119908

sum

119895=1

119889 (119901119894 119901119895) (9)

where dissimilarity 119889(119901119894 119901119895) between two patches is given by

119889 (119901119894 119901119895) =

10038171003817100381710038171003817120572 (119901119894) minus 120572 (119901

119895)10038171003817100381710038171003817

1 +10038171003817100381710038171003817119883119894minus 119883119895

10038171003817100381710038171003817

(10)

where119883119894and119883

119895are the central position of the patches 119901

119894and

119901119895 For the case of sparse representation we have

120572 (119901119894) = 119901

1015840

119894(11)

and for nonlinear representation we have

120572 (119901119894) = 120595119894= 120595 (119862 (119901

119894)) (12)

Thus the saliencymap for patch 119901119894derived from 119868

1015840 and 11986810158401015840 can

be given as

1198781198681015840 (119901119894) =

1

119908

119908

sum

119895=1

100381710038171003817100381710038171199011015840

119894minus 1199011015840

119895

10038171003817100381710038171003817

1 +10038171003817100381710038171003817119883119894minus 119883119895

10038171003817100381710038171003817

(13)

and with 120595(119862(119901119894)) being in vector form

11987811986810158401015840 (119901119894) =

1

119908

119908

sum

119895=1

10038171003817100381710038171003817120595 (119862 (119901

119894)) minus 120595 (119862 (119901

119895))

10038171003817100381710038171003817

1 +10038171003817100381710038171003817119883119894minus 119883119895

10038171003817100381710038171003817

(14)

The multiscale saliency by sparse approach is given by

1198781198681015840 (119909) = N(prod

1198971015840120598119871

1198781198971015840

1198681015840 (119909)) (15)

and for nonlinear integrated approach is given by

11987811986810158401015840 (119909) = N(prod

119897120598119871

119878119897

11986810158401015840 (119909)) (16)

where 1198971015840 and 119897 represent the number of scales and N shows

normalization Finally saliency map becomes

119878119868

= 119866120590

⊛ (N (1198781198681015840 lowast 11987811986810158401015840)) (17)

where119866120590represents the Gaussian smoothing by convolution

⊛ operation and lowast stands for multiplication operation

4 Experimentation

In this section we thoroughly evaluate the proposed modelwith three different experiments human eye fixation pre-diction salient-object detection and response to variouspsychological patterns The human eye fixation prediction isthe basic and necessary test to check the performance of asaliency map against the collected eye fixation from severalhuman subjects

How well a saliency map distinguishes and highlights anobject in an image shows its ability of salient object detectionThe salient object detection capability of a model is evaluatedby employing some metrics that compare the generatedsaliency map against the ground truth made by manuallabeling of the salient region in an image by human subjectsThe psychological patterns give a qualitative analysis of thesaliency model These patterns are designed to check pop-upresponses in different scenarios like orientation conjunctionand color and so forth Code (Matlab P-code) of the proposedmodel used for experimentation is available online [44]

41 Parameter Setting Before pursuing the evaluation of theproposed model we fix the parameters used to generatethe saliency maps by our model These parameters willremain the same for all the experiments Derivation ofthese parameters will be discussed in the next section afterthe introduction of the datasets and the metric used forevaluation

Sparse Representation We resize all input images to 80 times

60 pixels and use only single scale 1198971015840

= 1 for saliencycomputation Patches of 5 times 5 pixels [22] are generated withsliding overlapping window from every input image to learnthe basis for the dictionary and for the saliency computation

The Scientific World Journal 7

Table 1 Following are the input image resolution used in eachalgorithm with default parameter settings Some algorithms (atleastlowast) do internally reduce dimensions for fast computation andoptimal performance

Model Acronym ResolutionAIM [16] 12(W times H)lowastGBVS [37] W times HMESR [39] 64 times 48lowastICL [46] W times HlowastItti [37] W times HMPQFT [39] 64 times 48AWS [23] W times HlowastSDSR [26] W times HSUN [47] 12(W times H)HFT [2] 128 times 128LG [29] 256 times 256ERDM [17] 512 times 512ΔQDCT [39] 64 times48lowastProposed W times H

An ICA package available online FAST ICA [45] is used forthis experimentation

Nonlinear Representation For nonlinear image representa-tion RGB color and position information is used in onlineavailable implementation of E Erdem andA Erdem [17]Thesaliency is computed with the default parameters used in [17]that have every input image being resized to 512 times 512 pixelsand five different patch sizes 119901

119894= 8 16 32 64 128 and thus

119897 = 5 are used for saliency computationFinally normalized sparse representationrsquos saliency map

is rescaled to the size of nonlinear representationrsquos saliencymap and both maps are multiplied and normalizedThen thefinal saliencymap is resized to the actual input image size andused for experimentation The input image resolutions usedin all the saliency algorithms for experimentation are givenin Table 1

42 Human Eye Fixation Prediction In order to validatethe proposed model with human eye fixation predictionssaliency maps are generated on three datasets and for a faircomparison shuffle area under the curve score(sAUC) is usedto quantify the results

Dataset A reasonable dataset that can be used for evaluationof human eye fixation prediction must be complex anddiverse enough so that performance can be thoroughly eval-uated In literature Toronto [16] and Kootstra [34] datasetsare the most popular and widely used datasets IMSAL [2] isa relatively new dataset which we also used in our evaluation

Toronto dataset was prepared by Bruce and Tsotsos [16]and it consists of 120 images each with 681 times 511 pixels Thisdataset has both indoor and outdoor imagesThe eye fixationground truth is based on 20 subjects who free viewed theimages for few seconds

Kootstra dataset was used in [34] It consists of 101 imageseach with a resolution of 1024 times 768 pixels These imagesconsist of flowers natural scenes automans and buildingsThis dataset is significantly complex because it has manyimages with no explicit salient regions The eye fixationground truth available with this dataset is based on freeviewing of 31 subjects for a few seconds

IMSAL dataset is given by Li et al [2] it consists of 235images which are collected online through an internet searchengine and some images were taken from literature Theseimages are divided into six categories with 50 images havinglarge salient regions 80 images with intermediate salientregions 60 images with small salient regions 15 imageswith cluttered backgrounds and 15 images with repeatingdistracters These images give a good benchmark for per-formance evaluation because of the significant complexitygiven by variable size of salient objects objects with clutterand objects with distracters The accompanied ground truthconsists of both eye fixation information and binary maskscreated by human subjects who manually marked the salientobject in an image

Metric for Evaluation The most popular method to evaluatethe performance of a saliency map is to calculate area underthe curve (AUC) score of receiver operating characteristics(ROC) curve At first a saliency map is thresholded andused as a binary classifier with human eye fixations actingas positive set and some other points uniformly random asnegative set to plot an ROC curve The AUC of that ROC iscalculated andused as ameasure of performance comparison

There are various variants of AUC available in literatureand the basic difference between them is the choice ofnegative set points We will use the shuffled area under thecurve sAUC score because of its ability to cater for centerbias [41] since some models implicitly incorporate centerbias which makes a fair comparison difficult to performit is becoming standard to present results with sAUC InsAUC score the positive points consists of human subjectseye fixation on that image and the negative set consists ofall the fixation of subjects on the rest of the dataset imagesThe sAUC gives a 05 score on a center Gaussian blob whichis about the same as a random or chance score whereas allthe other versions of AUC [6] give very high score becausethey are affected by the center bias For our experimentationwe used sAUC available online by Schauerte and Stiefelhagen[39] We calculate every sAUC score for 20 times [47]and then use the mean value We found that the standarddeviation of the sAUC approximately ranges from 1119864 minus 4 to5119864 minus 4 in our experiments

Performance Analysis with Resolution In order to find theoptimal parameters for the proposed model we treat bothrepresentations separately and find the best parameters foreach representation that can be incorporated in the proposedmodel We plotted both sparse and nonlinear representationwith variable resolution on all three datasets and measurethe sAUC score Using various parameters given in Table 2Figure 5 is plotted which gives the performance of bothrepresentations on the three datasets Figure 5 shows that for

8 The Scientific World Journal

Table 2 Various scales parameters which are used for evaluation of sparse and nonlinear representation

Scale Adaptive sparse representation Nonlinear representationResolution Patch size Resolution Patch size

1 40 times 30 5 128 times 128 2 4 8 16 32

2 80 times 60 5 256 times 256 4 8 16 32 64

3 160 times 120 5 512 times 512 8 16 32 64 128

08

07

06

1 2 3

Scale

sAU

C

Sparse representation

BruceKootstra

IMSAL

(a)

Non-linear representation

075

070

065

0601 2 3

Scale

sAU

C

BruceKootstra

IMSAL

(b)

Figure 5 sAUC score for sparse and nonlinear representation The values in (a) and (b) are maximum at scale 2 and scale 3 for all threedatasets

sparse representation the performance is maximum with 80times 60 pixels (scale 2) and for nonlinear representation we getgood performance at 512 times 512 pixels (scale 3) Usually theresolution of the input images is not very high so we restrictto 512 times 512 as upper bound for nonlinear representation andsame resolution with respective patch sizes is used in [17]The image resolution and patch sizes of different scales forthe both models used for evaluation are given in Table 2Based on this analysis we incorporate the parameters of scale2 and scale 3 for sparse and nonlinear representation in theproposed saliency model

Performance Comparison with Other Models The results ofour model along with comparison with 13 state-of-the-artmethods are given in Table 3 The detailed performancewith variable Gaussian smoothing is given in Figure 6 Thesimulation codes for all these methods are taken from theauthors websites We used multiscale and quaternion basedimplementation for spectral residue (MESR) PQFT [4] andDCT [49] as proposed by Schauerte and Stiefelhagen [39]which gives higher score than original methods Erdemrsquos [17]implementation with first order statistics embedded is usedfor simulation since it gives higher sAUC score Results withthe proposed technique are quite consistent and the proposedmethod outperforms the state-of-the-artΔQDCT [39]model

on the Toronto dataset and performs comparatively well onthe Kootstra dataset and IMSAL dataset No single modelperforms well on all these datasets and the performanceof other models changes with the dataset but our modelshows consistency and remained either ranked 1 or ranked2 on these datasets We believe that the high eye fixationprediction is due to the adaptive nature of the model anddue to dual representation of features The adaptivity makesa feature space optimal for the current image thus a moreaccurate representation of the features is possible which inturn accounts for better saliency map estimation Moreovera single representation may not be enough for every caseFinally these results can improve further if we use a multi-scale representation for ICA based representation which weskipped due to computational time constraints

43 Salient Object Detection A saliency map can be usedto detect a salient object in an image The basic premiseis that if an image consists of an object which stands outfrom the rest of the image then it should be identifiedby a saliency algorithm There is a different branch invisual saliency modeling which consists of models that arespecifically designed to detect salient objects These modelsfind the salient object in an image and then segment thewhole

The Scientific World Journal 9

000 002 004 006 008

120590

072

070

068

066

064

062

060

Toronto dataset

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFTLGERDEMQDCTAWSProposed

sAU

C

010 012 014

(a)

062

060

058

056

054

052

Kootstra dataset

000 002 004 006 008

120590

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFTLGERDEMQDCTAWSProposed

sAU

C

010 012 014

(b)

078

076

074

072

070

068

066

IMSAL dataset

000 002 004 006 008

120590

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFT

ERDEMQDCTAWSProposed

sAU

C

LG

010 012 014 016

(c)

Figure 6 Toronto Kootstra and IMSAL datasets with variable Gaussian smoothing for all algorithms in comparison The 119909-axis representsthe 120590 of the smoothing Gaussian (in image width) (In (c) only GBVS LG and ΔQDCT are taken from [48])

10 The Scientific World Journal

Table3Com

paris

onof

sAUCscoreo

fthe

prop

osed

mod

elwith

13otherstate-of-the-artmod

els

Dataset

AIM

[16]

GBV

S[37]

MES

R[39]

ICL[46]

Itti[37]

MPQ

FT[39]

SDSR

[26]

SUN[47]

HFT

[2]

LG[29]

ERDM

[17]

ΔQDCT

[39]

AWS[23]

Prop

osed

TORO

NTO

[16]

0699

064

0407141

064

76064

8107119

07062

066

6106915

06939

07023

07172

07130

07179

Optim

al120590

004

001

005

003

001

004

003

003

006

004

004

005

001

004

KOOTS

TRA[34]

05901

05572

05978

05769

05689

05985

05984

05613

05911

05938

0603

05998

06185

06157

Optim

al120590

003

001

004

001

004

005

004

002

005

001

003

003

001

003

IMSA

L[2]

07424

lowast076

6507153

07364

07512

071694

07403

06955

07419lowast07356

07455

lowast07434

07468

07560

Optim

al120590

008

007

015

013

009

014

013

016

009

012

012

010

005

009

Thetop

rank

edmod

elisin

boldfont

andsecond

rank

edmod

elisin

italic

fontH

erer

esultswith

optim

alaverages

AUCof

each

metho

dwith

thec

orrespon

ding

Gaussianblur(120590)are

repo

rtedA

sin[47]w

erepeat

thes

AUCcalculationfor2

0tim

esandcompu

tethes

tand

arddeviationof

each

averages

AUC

which

ranges

from1119864minus4to5119864minus4(lowastvalues

takenfro

m[48])

The Scientific World Journal 11

Table4Com

paris

onof

AUCandPo

DSC

scoreo

fthe

prop

osed

mod

elwith

otherstate-of-the-artmod

els

Algorith

mC1

C2C3

C4C5

C6Av

gAU

CPo

DSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AvgAU

CAIM

0921

0653

0918

0543

0957

0461

0904

0435

0924

0508

0941

0671

0927

MES

R0798

0494

0854

0436

0927

0377

0718

0238

0837

0398

0918

0617

0842

ICL

0928

0681

0907

0501

0909

0348

0926

0537

0902

0498

0914

0569

0914

Itti

0939

0687

0924

0542

0952

0457

0882

0423

0924

0499

0941

0631

0927

MPQ

FT0805

0497

0865

0452

0927

0386

0735

0248

0852

0436

0921

0630

0851

SDSR

0879

0618

0908

0511

0948

0435

0801

0298

0904

0489

0933

064

50896

SUN

0766

0459

0787

0349

0880

0362

0708

0263

0719

0283

0817

0451

0780

HFT

0937

070

00923

0552

0937

044

8096

4064

80933

0554

096

0070

60942

LG0814

0515

0859

046

40904

0379

0635

0145

0862

0499

0884

0572

0826

ERDEM

0882

0617

0912

0555

0962

0526

0840

0349

0923

0575

0916

0635

0906

MQDCT

0840

0564

0894

0517

0947

0489

0806

0311

0888

0524

0923

0670

0883

AWS

0894

0625

0922

0569

0957

0480

0891

0402

0920

0534

0925

0655

0918

Prop

osed

0920

0653

094

3059

60976

0555

0934

0516

094

20532

0956

0688

094

5Th

etop

rank

edmod

elisin

bold

font

andthe2

ndrank

edmod

elisin

italic

fontTh

eoverallbestaverageA

UCscoreisg

iven

byou

rpropo

sedmod

el

12 The Scientific World Journal

Input Proposed AIM GBVS MESR ICL Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 7 Response of our algorithm and other 13 state-of-the-art algorithms on various psychological patterns

extent of the object and thus solve this task like segmentation-type binary labeling formulation [50 51] In contrast ourmodel is designed for location based (eye fixation) saliencymodeling and it is not designed to capture exact objectboundaries however by thresholding a saliency map we canget a binary map that can be used for testing performance ofa model for salient object detection Since our saliencymodelis location based we only use other location based modelsin comparison for a fair evaluation similar convention isfollowed in [2 17]

Dataset and Metric for Evaluation For salient object detec-tion the metric used by Li et al [2] is area under thecurve (AUC) and dice similarity coefficient (DSC) on IMSAL[2] dataset We will use the same metric and dataset forsalient object detectionThe DSC gives the overlap between athreshold saliency map and the ground truth Moreover thepeak value of the DSC [52] is considered an important way toestablish the best performance of an algorithm at an optimalthreshold thus we also give results with peak value of theDSC curve (PoDSC) Since AUC can be influenced by centerbias for fair comparison we turn off the center bias in allthe algorithmsThe GBVS [37] has built-in center bias and inorder to cater for that the author in [2] incorporates explicitcenter bias and shows that HFT [2] performs better thanGBVS on the same dataset which is also used in our paperWe do not employ center bias explicitly or implicitly in thepresented results and addedHFT instead for our comparisonTherefore GVBS [37] is skipped from Table 4 Furthermorewe perform Gaussian smoothing in all the algorithms tofind the optimal smoothing parameters for each class in thedataset and the optimal performance is quoted in the resultsgiven in Table 4

Performance We present results in comparison with other 12state-of-the-art algorithms The complete results are given in

Table 4 Our proposed scheme gives the best performanceon three categories C2 C3 and C5 and ranked secondon C4 and C6 Our model gives the highest average AUCscore on this dataset On different categories our results arecomparative to the HFT which is state of the art on thisdataset Apart fromHFT the performance is also significantlybetter in comparison to other algorithmsThedataset used forcomparison is quite complex but our algorithm performedwell for intermediate small objects with distracters althoughperformance on other cases is little less than other state-of-the-art algorithms

44 Psychological Patterns Wealso tested our saliencymodelon psychological patterns which are commonly used to givea qualitative performance on artificial scenarios which simu-late a pop-up phenomenonThese patterns simulate the pop-out phenomenon based on color intersection symmetryorientation curvature and candles image In order to checkthe general performance on various psychological tasks wetested the proposed model on eight psychological patternsFigure 7 gives the results of the proposed saliency modelalong with other popular models The proposed algorithmworks well on color symmetry and orientation as well as oncandle image but the performance is not good for curvatureand intersection patterns Figure 7 also shows that any singlemodel does not give good performance on all the patterns andthe best performance is more or less the same as given by theproposed scheme

5 Discussion

The image features representation drastically affects theinformation content and thus saliency estimation We useda bioinspired center surround saliency computation on twoparallel feature representations that give good performance

The Scientific World Journal 13

(a)

(b) (c)

(d) (e)

Figure 8 The fixed learned dictionary and the information lost by using such dictionary (a) Fixed learned dictionary ((b)-(c)) The inputimages ((d)-(e)) The information lost by using dictionary given in (a)

on both eye fixation prediction and salient object detectionSince a CSD operation depends on the difference between acenter portion and its surroundings a better image contentsrepresentation makes a more accurate and precise centersurround operation

We used an adaptive sparse representation to boost theperformance of the CSD operation In order to show theeffectiveness of the proposed approach we present bothquantitative and qualitative results For qualitative compar-ison we use a fix dictionary [29] (Figure 8(a)) learnt from

an ensemble of natural images We show that some of theinformation is lost if we use a fixed dictionary to represent animage and usually the lost information belongs to the salientregion in an image The difference between an input imageand the image reconstructed by a fixed dictionary is givenin Figure 8 The red cylinder Figure 8(b) and red box withtext Figure 8(c) is visible in the image plots Figures 8(d)and 8(e) These two objects are the salient features in bothimages and their appearance in the residual image shows thatcurrent representation which uses a fixed dictionary loses

14 The Scientific World Journal

Input Gnd truth Proposed AIM GBVS Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 9 Comparison of our method with other state-of-the-art saliency models The first column is the input image with second columnbeing ground truth followed by our proposed saliency model Results from all the other models are plotted in the same row

Table 5 sAUC score comparison using fixed [29] and adaptivesparse representation (proposed) for CSD saliency

DatasetBorji local [29] Fixeddictionary sparserepresentation

Proposed Adaptivedictionary sparserepresentation

Toronto 0670 07110Kootstra 0572 06061

Table 6 sAUC comparison of E Erdem and A Erdem [17] and theproposed color based nonlinear integration

Dataset E Erdem andA Erdem [17]

Proposed Nonlinearrepresentation

Bruce and Tsotsos [16] 07023 07084Kootstra et al [34] 0603 06152

some information that belong to salient portion of the inputimage

For quantitative comparison we employ sAUC to dothe comparison between saliency maps based on a fixeddictionary and the adaptive basis dictionary Table 5 gives acomparison of the both approaches on two datasets Torontoand Kootstra The performance difference is quite significantand it shows that an adaptive representation is much betterBased on these qualitative and quantitative results we canconclude that an adaptive image presentation is more viableand accurate for CSD based saliency computation

A model proposed in [17] gives an idea of nonlinearintegration of all features by covariance matrices and thesupported implementation uses color gradient and spatialinformation Our second contribution is the modification ofthat model and a proposal of using only color with spatial

information in nonlinearly integrated manner using samecovariance matrices For our proposed implementation(seeFigure 4) we modified the features used in [17] and comparethe saliencymap with Erdemrsquos [17] model in Table 6 using thesAUC score Two databases Toronto [16] and Kootstra [34]are used for simulations and the results indicate that by usingonly color with spatial information we can get better sAUCscore than integrating all features using covariance matricesIn Table 6 the difference in sAUC score is quite visible onboth datasets One possible reason of this improvement maybe that the correlation among different features like color andorientation is different and thus using a covariance basedrepresentation does not capture the underlying informationstructure in an efficient way as compared to when only colorinformation is used

One possible argument against our usage of only colorinformation however can be that without any gradient ororientation information a saliency model will fail to detectmany salient regions This argument can also supported bynature since neurons tuned to orientation in an image areknown to contribute to saliency computation [53] In ourmodel this issue is addressed by the sparse representationwhere the adaptive basis same as basis shown in Figure 8(a)is Gabor like filters with edge like structure and thus thesebases efficiently capture orientation information from theimage which complements our color information in nonlin-ear representation

Finally the sAUC score of dual representation Table 3shows that we achieve better eye fixation prediction thantreating both representations separately as shown in Tables5 and 6We believe that such improvement is due to the com-plementary behavior of both techniques since a combinedapproach better represents image contents with high fidelityand thus that in turn improves saliency detection Lastly for

The Scientific World Journal 15

illustration and visual comparison we present some saliencymaps produced by our algorithm along with other models inFigure 9

6 Conclusion

This paper shows that dual feature representationmanages torobustly capture image information which can be used in acenter surround operation to compute saliencyWe show thata CSD on adaptive sparse basis gives better results than a fixsparse basis representation In nonlinear representation weshow that nonlinearly integrated color channels with spatialinformation better capture underlying data structure andthus a CSD on such representation gives good results Finallywe consider both representations as complementary and thusa fused saliencymap not only give good results on human eyefixations but also detect salient objects with high accuracy

In future wewill incorporate some top-downmechanismto better imitative human saliency computation capabilitiesbased on learning and experience Another possible exten-sion of the existing work is to test dynamic scenes video byincorporating additional motion information in the currentscheme

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the national Natural Sci-ence Foundation of China (61175096) and Specialized Fundfor Joint Building Program of Beijing Municipal EducationCommission

References

[1] C Siagian and L Itti ldquoBiologically inspired mobile robot visionlocalizationrdquo IEEE Transactions on Robotics vol 25 no 4 pp861ndash873 2009

[2] J Li M D Levine X An X Xu and H He ldquoVisual saliencybased on scale-space analysis in the frequency domainrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 4 pp 996ndash1010 2012

[3] Y Su Q Zhao L Zhao and D Gu ldquoAbrupt motion trackingusing a visual saliency embedded particle filterrdquo Pattern Recog-nition vol 47 no 5 pp 1826ndash1834 2014

[4] C Guo Q Ma and L Zhang ldquoSpatio-temporal saliency detec-tion using phase spectrum of quaternion fourier transformrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo08) pp 1ndash8 June 2008

[5] X Hou and L Zhang ldquoThumbnail generation based on globalsaliencyrdquo in Advances in Cognitive NeurodynamicsmdashICCN2007 pp 999ndash1003 Springer Amsterdam The Netherlands2007

[6] A Borji and L Itti ldquoState-of-the-art in visual attention mod-elingrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 35 no 1 pp 185ndash207 2013

[7] L Itti C Koch and E Niebur ldquoAmodel of saliency-based visualattention for rapid scene analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 20 no 11 pp 1254ndash12591998

[8] AM Treisman ldquoFeature integration theoryrdquoCognitive Psychol-ogy vol 12 no 1 pp 97ndash136 1980

[9] J K Tsotsos S M Culhane W Y Kei Wai Y Lai N Davisand F Nuflo ldquoModeling visual attention via selective tuningrdquoArtificial Intelligence vol 78 no 1-2 pp 507ndash545 1995

[10] R Rae Gestikbasierte Mensch-Maschine-Kommunikation aufder grundlage visueller aufmerksamkeit und adaptivitat [PhDthesis] Universitat Bielefeld 2000

[11] D Parkhurst K Law and E Niebur ldquoModeling the role ofsalience in the allocation of overt visual attentionrdquo VisionResearch vol 42 no 1 pp 107ndash123 2002

[12] J Li Y Tian T Huang and W Gao ldquoProbabilistic multi-tasklearning for visual saliency estimation in videordquo InternationalJournal of Computer Vision vol 90 no 2 pp 150ndash165 2010

[13] MCerf J HarelW Einhauser andCKoch ldquoPredicting humangaze using low-level saliency combined with face detectionrdquo inAdvances in Neural Information Processing Systems vol 20 pp241ndash248 2007

[14] A Torralba ldquoModeling global scene factors in attentionrdquoJournal of the Optical Society of America A Optics and ImageScience and Vision vol 20 no 7 pp 1407ndash1418 2003

[15] A Oliva and A Torralba ldquoModeling the shape of the scenea holistic representation of the spatial enveloperdquo InternationalJournal of Computer Vision vol 42 no 3 pp 145ndash175 2001

[16] N D B Bruce and J K Tsotsos ldquoSaliency attention and visualsearch an information theoretic approachrdquo Journal of Visionvol 9 no 3 article 5 2009

[17] E Erdem and A Erdem ldquoVisual saliency estimation by nonlin-early integrating features using region covariancesrdquo Journal ofVision vol 13 no 4 article 11 2013

[18] J H van Hateren ldquoReal and optimal neural images in earlyvisionrdquo Nature vol 360 no 6399 pp 68ndash70 1992

[19] D J Field ldquoRelations between the statistics of natural imagesand the response properties of cortical cellsrdquo Journal of theOptical Society of America A Optics and Image Science vol 4no 12 pp 2379ndash2394 1987

[20] B A Olshausen and D J Field ldquoNatural image statistics andefficient codingrdquo Network Computation in Neural Systems vol7 no 2 pp 333ndash339 1996

[21] M Kwon G Legge F Fang A Cheong and S He ldquoIdentifyingthe mechanism of adaptation to prolonged contrast reductionrdquoJournal of Vision vol 9 no 8 p 976 2009

[22] X Sun H Yao and R Ji ldquoVisual attention modeling based onshort-term environmental adaptionrdquo Journal of Visual Commu-nication and Image Representation vol 24 no 2 pp 171ndash1802013

[23] A Garcia-Diaz X R Fdez-Vidal X M Pardo and R DosilldquoSaliency from hierarchical adaptation through decorrelationand variance normalizationrdquo Image and Vision Computing vol30 no 1 pp 51ndash64 2012

[24] C E Shannon ldquoAmathematical theory of communicationrdquo BellSystem Technical Journal vol 27 no 3 pp 379ndash423 1948

[25] Y Karklin and M S Lewicki ldquoEmergence of complex cellproperties by learning to generalize in natural scenesrdquo Naturevol 457 no 7225 pp 83ndash86 2009

[26] H J Seo and P Milanfar ldquoStatic and space-time visual saliencydetection by self-resemblancerdquo Journal of Vision vol 9 no 12pp 1ndash27 2009

16 The Scientific World Journal

[27] D J Jobson Z-U Rahman and G AWoodell ldquoProperties andperformance of a centersurround retinexrdquo IEEE Transactionson Image Processing vol 6 no 3 pp 451ndash462 1997

[28] A Guo D Zhao S Liu X Fan and W Gao ldquoVisual attentionbased image quality assessmentrdquo in Proceedings of the 18thIEEE International Conference on Image Processing (ICIP rsquo11) pp3297ndash3300 September 2011

[29] A Borji and L Itti ldquoExploiting local and global patch raritiesfor saliency detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo12) pp 478ndash485 2012

[30] L Duan C Wu J Miao L Qing and Y Fu ldquoVisual saliencydetection by spatially weighted dissimilarityrdquo in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR rsquo11) pp 473ndash480 June 2011

[31] JMWolfe ldquoGuided search 20 a revisedmodel of visual searchrdquoPsychonomic Bulletin amp Review vol 1 no 2 pp 202ndash238 1994

[32] O Le Meur P Le Callet D Barba and DThoreau ldquoA coherentcomputational approach to model bottom-up visual attentionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 28 no 5 pp 802ndash817 2006

[33] A Torralba A Oliva M S Castelhano and J M HendersonldquoContextual guidance of eye movements and attention in real-world scenes the role of global features in object searchrdquoPsychological Review vol 113 no 4 pp 766ndash786 2006

[34] G Kootstra N Bergstrom and D Kragic ldquoUsing symmetry toselect fixation points for segmentationrdquo in Proceedings of the20th International Conference on Pattern Recognition (ICPR rsquo10)pp 3894ndash3897 August 2010

[35] C Li J Xue N Zheng X Lan and Z Tian ldquoSpatio-temporalsaliency perception via hypercomplex frequency spectral con-trastrdquo Sensors vol 13 no 3 pp 3409ndash3431 2013

[36] L Itti and P Baldi ldquoBayesian surprise attracts human attentionrdquoVision Research vol 49 no 10 pp 1295ndash1306 2009

[37] J Harel C Koch and P Perona ldquoGraph-based visual saliencyrdquoin Advances in Neural Information Processing Systems vol 19pp 545ndash552 2006

[38] X Hou and L Zhang ldquoSaliency detection a spectral residualapproachrdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo07)pp 1ndash8 June 2007

[39] B Schauerte and R Stiefelhagen ldquoQuaternion-based spectralsaliency detection for eye fixation predictionrdquo in ComputerVisionmdashECCV 2012 A Fitzgibbon S Lazebnik P Perona YSato and C Schmid Eds Lecture Notes in Computer Sciencepp 116ndash129 2012

[40] W Kienzle F Wichmann B Scholkopf and M Franz ANonparametric Approach to Bottom-Up Visual Saliency 2007

[41] J Tilke K Ehinger F Durand and A Torralba ldquoLearningto predict where humans lookrdquo in Proceedings of the 12thInternational Conference on Computer Vision (ICCV rsquo09) pp2106ndash2113 October 2009

[42] W Wang Y Wang Q Huang and W Gao ldquoMeasuring visualsaliency by site entropy raterdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 2368ndash2375 June 2010

[43] O Tuzel F Porikli and P Meer ldquoRegion covariance afast descriptor for detection and classificationrdquo in ComputerVisionmdashECCV 2006 vol 3952 of Lecture Notes in ComputerScience pp 589ndash600 2006

[44] httpssitesgooglecomsitesparsenonlinearsaliencymodelhomedownloads

[45] A Hyvarinen and E Oja ldquoA fast fixed-point algorithm forindependent component analysisrdquo Neural Computation vol 9no 7 pp 1483ndash1492 1997

[46] X Hou and L Zhang ldquoDynamic visual attention searching forcoding length incrementsrdquo in Advances in Neural InformationProcessing Systems vol 21 pp 681ndash688 2008

[47] L Zhang M H Tong T K Marks H Shan and G WCottrell ldquoSUN a Bayesian framework for saliency using naturalstatisticsrdquo Journal of Vision vol 8 no 7 article 32 pp 1ndash202008

[48] J Zhang and S Sclaroff ldquoSaliency detection a boolean mapapproachrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV rsquo13) 2013

[49] X Hou J Harel and C Koch ldquoImage signature highlightingsparse salient regionsrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 34 no 1 pp 194ndash201 2012

[50] E Rahtu J Kannala M Salo and J Heikkila ldquoSegmentingsalient objects from images and videosrdquo in Computer VisionmdashECCV 2010 vol 6315 of Lecture Notes in Computer Science pp366ndash379 2010

[51] M-M Cheng G-X Zhang N J Mitra X Huang and S-M Hu ldquoGlobal contrast based salient region detectionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo11) pp 409ndash416 June 2011

[52] T Veit J-P Tarel P Nicolle and P Charbonnier ldquoEvaluationof road marking feature extractionrdquo in Proceedings of the11th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo08) pp 174ndash181 December 2008

[53] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model of simple receptive fields in catstriate cortexrdquo Journal of Neurophysiology vol 58 no 6 pp1233ndash1258 1987

Page 7: Saliency Detection Using Sparse and Nonlinear …...Saliency Detection Using Sparse and Nonlinear Feature Representation ShahzadAnwar,1 QingjieZhao,1 MuhammadFarhanManzoor,2 andSaqibIshaqKhan3

The Scientific World Journal 7

Table 1 Following are the input image resolution used in eachalgorithm with default parameter settings Some algorithms (atleastlowast) do internally reduce dimensions for fast computation andoptimal performance

Model Acronym ResolutionAIM [16] 12(W times H)lowastGBVS [37] W times HMESR [39] 64 times 48lowastICL [46] W times HlowastItti [37] W times HMPQFT [39] 64 times 48AWS [23] W times HlowastSDSR [26] W times HSUN [47] 12(W times H)HFT [2] 128 times 128LG [29] 256 times 256ERDM [17] 512 times 512ΔQDCT [39] 64 times48lowastProposed W times H

An ICA package available online FAST ICA [45] is used forthis experimentation

Nonlinear Representation For nonlinear image representa-tion RGB color and position information is used in onlineavailable implementation of E Erdem andA Erdem [17]Thesaliency is computed with the default parameters used in [17]that have every input image being resized to 512 times 512 pixelsand five different patch sizes 119901

119894= 8 16 32 64 128 and thus

119897 = 5 are used for saliency computationFinally normalized sparse representationrsquos saliency map

is rescaled to the size of nonlinear representationrsquos saliencymap and both maps are multiplied and normalizedThen thefinal saliencymap is resized to the actual input image size andused for experimentation The input image resolutions usedin all the saliency algorithms for experimentation are givenin Table 1

42 Human Eye Fixation Prediction In order to validatethe proposed model with human eye fixation predictionssaliency maps are generated on three datasets and for a faircomparison shuffle area under the curve score(sAUC) is usedto quantify the results

Dataset A reasonable dataset that can be used for evaluationof human eye fixation prediction must be complex anddiverse enough so that performance can be thoroughly eval-uated In literature Toronto [16] and Kootstra [34] datasetsare the most popular and widely used datasets IMSAL [2] isa relatively new dataset which we also used in our evaluation

Toronto dataset was prepared by Bruce and Tsotsos [16]and it consists of 120 images each with 681 times 511 pixels Thisdataset has both indoor and outdoor imagesThe eye fixationground truth is based on 20 subjects who free viewed theimages for few seconds

Kootstra dataset was used in [34] It consists of 101 imageseach with a resolution of 1024 times 768 pixels These imagesconsist of flowers natural scenes automans and buildingsThis dataset is significantly complex because it has manyimages with no explicit salient regions The eye fixationground truth available with this dataset is based on freeviewing of 31 subjects for a few seconds

IMSAL dataset is given by Li et al [2] it consists of 235images which are collected online through an internet searchengine and some images were taken from literature Theseimages are divided into six categories with 50 images havinglarge salient regions 80 images with intermediate salientregions 60 images with small salient regions 15 imageswith cluttered backgrounds and 15 images with repeatingdistracters These images give a good benchmark for per-formance evaluation because of the significant complexitygiven by variable size of salient objects objects with clutterand objects with distracters The accompanied ground truthconsists of both eye fixation information and binary maskscreated by human subjects who manually marked the salientobject in an image

Metric for Evaluation The most popular method to evaluatethe performance of a saliency map is to calculate area underthe curve (AUC) score of receiver operating characteristics(ROC) curve At first a saliency map is thresholded andused as a binary classifier with human eye fixations actingas positive set and some other points uniformly random asnegative set to plot an ROC curve The AUC of that ROC iscalculated andused as ameasure of performance comparison

There are various variants of AUC available in literatureand the basic difference between them is the choice ofnegative set points We will use the shuffled area under thecurve sAUC score because of its ability to cater for centerbias [41] since some models implicitly incorporate centerbias which makes a fair comparison difficult to performit is becoming standard to present results with sAUC InsAUC score the positive points consists of human subjectseye fixation on that image and the negative set consists ofall the fixation of subjects on the rest of the dataset imagesThe sAUC gives a 05 score on a center Gaussian blob whichis about the same as a random or chance score whereas allthe other versions of AUC [6] give very high score becausethey are affected by the center bias For our experimentationwe used sAUC available online by Schauerte and Stiefelhagen[39] We calculate every sAUC score for 20 times [47]and then use the mean value We found that the standarddeviation of the sAUC approximately ranges from 1119864 minus 4 to5119864 minus 4 in our experiments

Performance Analysis with Resolution In order to find theoptimal parameters for the proposed model we treat bothrepresentations separately and find the best parameters foreach representation that can be incorporated in the proposedmodel We plotted both sparse and nonlinear representationwith variable resolution on all three datasets and measurethe sAUC score Using various parameters given in Table 2Figure 5 is plotted which gives the performance of bothrepresentations on the three datasets Figure 5 shows that for

8 The Scientific World Journal

Table 2 Various scales parameters which are used for evaluation of sparse and nonlinear representation

Scale Adaptive sparse representation Nonlinear representationResolution Patch size Resolution Patch size

1 40 times 30 5 128 times 128 2 4 8 16 32

2 80 times 60 5 256 times 256 4 8 16 32 64

3 160 times 120 5 512 times 512 8 16 32 64 128

08

07

06

1 2 3

Scale

sAU

C

Sparse representation

BruceKootstra

IMSAL

(a)

Non-linear representation

075

070

065

0601 2 3

Scale

sAU

C

BruceKootstra

IMSAL

(b)

Figure 5 sAUC score for sparse and nonlinear representation The values in (a) and (b) are maximum at scale 2 and scale 3 for all threedatasets

sparse representation the performance is maximum with 80times 60 pixels (scale 2) and for nonlinear representation we getgood performance at 512 times 512 pixels (scale 3) Usually theresolution of the input images is not very high so we restrictto 512 times 512 as upper bound for nonlinear representation andsame resolution with respective patch sizes is used in [17]The image resolution and patch sizes of different scales forthe both models used for evaluation are given in Table 2Based on this analysis we incorporate the parameters of scale2 and scale 3 for sparse and nonlinear representation in theproposed saliency model

Performance Comparison with Other Models The results ofour model along with comparison with 13 state-of-the-artmethods are given in Table 3 The detailed performancewith variable Gaussian smoothing is given in Figure 6 Thesimulation codes for all these methods are taken from theauthors websites We used multiscale and quaternion basedimplementation for spectral residue (MESR) PQFT [4] andDCT [49] as proposed by Schauerte and Stiefelhagen [39]which gives higher score than original methods Erdemrsquos [17]implementation with first order statistics embedded is usedfor simulation since it gives higher sAUC score Results withthe proposed technique are quite consistent and the proposedmethod outperforms the state-of-the-artΔQDCT [39]model

on the Toronto dataset and performs comparatively well onthe Kootstra dataset and IMSAL dataset No single modelperforms well on all these datasets and the performanceof other models changes with the dataset but our modelshows consistency and remained either ranked 1 or ranked2 on these datasets We believe that the high eye fixationprediction is due to the adaptive nature of the model anddue to dual representation of features The adaptivity makesa feature space optimal for the current image thus a moreaccurate representation of the features is possible which inturn accounts for better saliency map estimation Moreovera single representation may not be enough for every caseFinally these results can improve further if we use a multi-scale representation for ICA based representation which weskipped due to computational time constraints

43 Salient Object Detection A saliency map can be usedto detect a salient object in an image The basic premiseis that if an image consists of an object which stands outfrom the rest of the image then it should be identifiedby a saliency algorithm There is a different branch invisual saliency modeling which consists of models that arespecifically designed to detect salient objects These modelsfind the salient object in an image and then segment thewhole

The Scientific World Journal 9

000 002 004 006 008

120590

072

070

068

066

064

062

060

Toronto dataset

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFTLGERDEMQDCTAWSProposed

sAU

C

010 012 014

(a)

062

060

058

056

054

052

Kootstra dataset

000 002 004 006 008

120590

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFTLGERDEMQDCTAWSProposed

sAU

C

010 012 014

(b)

078

076

074

072

070

068

066

IMSAL dataset

000 002 004 006 008

120590

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFT

ERDEMQDCTAWSProposed

sAU

C

LG

010 012 014 016

(c)

Figure 6 Toronto Kootstra and IMSAL datasets with variable Gaussian smoothing for all algorithms in comparison The 119909-axis representsthe 120590 of the smoothing Gaussian (in image width) (In (c) only GBVS LG and ΔQDCT are taken from [48])

10 The Scientific World Journal

Table3Com

paris

onof

sAUCscoreo

fthe

prop

osed

mod

elwith

13otherstate-of-the-artmod

els

Dataset

AIM

[16]

GBV

S[37]

MES

R[39]

ICL[46]

Itti[37]

MPQ

FT[39]

SDSR

[26]

SUN[47]

HFT

[2]

LG[29]

ERDM

[17]

ΔQDCT

[39]

AWS[23]

Prop

osed

TORO

NTO

[16]

0699

064

0407141

064

76064

8107119

07062

066

6106915

06939

07023

07172

07130

07179

Optim

al120590

004

001

005

003

001

004

003

003

006

004

004

005

001

004

KOOTS

TRA[34]

05901

05572

05978

05769

05689

05985

05984

05613

05911

05938

0603

05998

06185

06157

Optim

al120590

003

001

004

001

004

005

004

002

005

001

003

003

001

003

IMSA

L[2]

07424

lowast076

6507153

07364

07512

071694

07403

06955

07419lowast07356

07455

lowast07434

07468

07560

Optim

al120590

008

007

015

013

009

014

013

016

009

012

012

010

005

009

Thetop

rank

edmod

elisin

boldfont

andsecond

rank

edmod

elisin

italic

fontH

erer

esultswith

optim

alaverages

AUCof

each

metho

dwith

thec

orrespon

ding

Gaussianblur(120590)are

repo

rtedA

sin[47]w

erepeat

thes

AUCcalculationfor2

0tim

esandcompu

tethes

tand

arddeviationof

each

averages

AUC

which

ranges

from1119864minus4to5119864minus4(lowastvalues

takenfro

m[48])

The Scientific World Journal 11

Table4Com

paris

onof

AUCandPo

DSC

scoreo

fthe

prop

osed

mod

elwith

otherstate-of-the-artmod

els

Algorith

mC1

C2C3

C4C5

C6Av

gAU

CPo

DSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AvgAU

CAIM

0921

0653

0918

0543

0957

0461

0904

0435

0924

0508

0941

0671

0927

MES

R0798

0494

0854

0436

0927

0377

0718

0238

0837

0398

0918

0617

0842

ICL

0928

0681

0907

0501

0909

0348

0926

0537

0902

0498

0914

0569

0914

Itti

0939

0687

0924

0542

0952

0457

0882

0423

0924

0499

0941

0631

0927

MPQ

FT0805

0497

0865

0452

0927

0386

0735

0248

0852

0436

0921

0630

0851

SDSR

0879

0618

0908

0511

0948

0435

0801

0298

0904

0489

0933

064

50896

SUN

0766

0459

0787

0349

0880

0362

0708

0263

0719

0283

0817

0451

0780

HFT

0937

070

00923

0552

0937

044

8096

4064

80933

0554

096

0070

60942

LG0814

0515

0859

046

40904

0379

0635

0145

0862

0499

0884

0572

0826

ERDEM

0882

0617

0912

0555

0962

0526

0840

0349

0923

0575

0916

0635

0906

MQDCT

0840

0564

0894

0517

0947

0489

0806

0311

0888

0524

0923

0670

0883

AWS

0894

0625

0922

0569

0957

0480

0891

0402

0920

0534

0925

0655

0918

Prop

osed

0920

0653

094

3059

60976

0555

0934

0516

094

20532

0956

0688

094

5Th

etop

rank

edmod

elisin

bold

font

andthe2

ndrank

edmod

elisin

italic

fontTh

eoverallbestaverageA

UCscoreisg

iven

byou

rpropo

sedmod

el

12 The Scientific World Journal

Input Proposed AIM GBVS MESR ICL Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 7 Response of our algorithm and other 13 state-of-the-art algorithms on various psychological patterns

extent of the object and thus solve this task like segmentation-type binary labeling formulation [50 51] In contrast ourmodel is designed for location based (eye fixation) saliencymodeling and it is not designed to capture exact objectboundaries however by thresholding a saliency map we canget a binary map that can be used for testing performance ofa model for salient object detection Since our saliencymodelis location based we only use other location based modelsin comparison for a fair evaluation similar convention isfollowed in [2 17]

Dataset and Metric for Evaluation For salient object detec-tion the metric used by Li et al [2] is area under thecurve (AUC) and dice similarity coefficient (DSC) on IMSAL[2] dataset We will use the same metric and dataset forsalient object detectionThe DSC gives the overlap between athreshold saliency map and the ground truth Moreover thepeak value of the DSC [52] is considered an important way toestablish the best performance of an algorithm at an optimalthreshold thus we also give results with peak value of theDSC curve (PoDSC) Since AUC can be influenced by centerbias for fair comparison we turn off the center bias in allthe algorithmsThe GBVS [37] has built-in center bias and inorder to cater for that the author in [2] incorporates explicitcenter bias and shows that HFT [2] performs better thanGBVS on the same dataset which is also used in our paperWe do not employ center bias explicitly or implicitly in thepresented results and addedHFT instead for our comparisonTherefore GVBS [37] is skipped from Table 4 Furthermorewe perform Gaussian smoothing in all the algorithms tofind the optimal smoothing parameters for each class in thedataset and the optimal performance is quoted in the resultsgiven in Table 4

Performance We present results in comparison with other 12state-of-the-art algorithms The complete results are given in

Table 4 Our proposed scheme gives the best performanceon three categories C2 C3 and C5 and ranked secondon C4 and C6 Our model gives the highest average AUCscore on this dataset On different categories our results arecomparative to the HFT which is state of the art on thisdataset Apart fromHFT the performance is also significantlybetter in comparison to other algorithmsThedataset used forcomparison is quite complex but our algorithm performedwell for intermediate small objects with distracters althoughperformance on other cases is little less than other state-of-the-art algorithms

44 Psychological Patterns Wealso tested our saliencymodelon psychological patterns which are commonly used to givea qualitative performance on artificial scenarios which simu-late a pop-up phenomenonThese patterns simulate the pop-out phenomenon based on color intersection symmetryorientation curvature and candles image In order to checkthe general performance on various psychological tasks wetested the proposed model on eight psychological patternsFigure 7 gives the results of the proposed saliency modelalong with other popular models The proposed algorithmworks well on color symmetry and orientation as well as oncandle image but the performance is not good for curvatureand intersection patterns Figure 7 also shows that any singlemodel does not give good performance on all the patterns andthe best performance is more or less the same as given by theproposed scheme

5 Discussion

The image features representation drastically affects theinformation content and thus saliency estimation We useda bioinspired center surround saliency computation on twoparallel feature representations that give good performance

The Scientific World Journal 13

(a)

(b) (c)

(d) (e)

Figure 8 The fixed learned dictionary and the information lost by using such dictionary (a) Fixed learned dictionary ((b)-(c)) The inputimages ((d)-(e)) The information lost by using dictionary given in (a)

on both eye fixation prediction and salient object detectionSince a CSD operation depends on the difference between acenter portion and its surroundings a better image contentsrepresentation makes a more accurate and precise centersurround operation

We used an adaptive sparse representation to boost theperformance of the CSD operation In order to show theeffectiveness of the proposed approach we present bothquantitative and qualitative results For qualitative compar-ison we use a fix dictionary [29] (Figure 8(a)) learnt from

an ensemble of natural images We show that some of theinformation is lost if we use a fixed dictionary to represent animage and usually the lost information belongs to the salientregion in an image The difference between an input imageand the image reconstructed by a fixed dictionary is givenin Figure 8 The red cylinder Figure 8(b) and red box withtext Figure 8(c) is visible in the image plots Figures 8(d)and 8(e) These two objects are the salient features in bothimages and their appearance in the residual image shows thatcurrent representation which uses a fixed dictionary loses

14 The Scientific World Journal

Input Gnd truth Proposed AIM GBVS Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 9 Comparison of our method with other state-of-the-art saliency models The first column is the input image with second columnbeing ground truth followed by our proposed saliency model Results from all the other models are plotted in the same row

Table 5 sAUC score comparison using fixed [29] and adaptivesparse representation (proposed) for CSD saliency

DatasetBorji local [29] Fixeddictionary sparserepresentation

Proposed Adaptivedictionary sparserepresentation

Toronto 0670 07110Kootstra 0572 06061

Table 6 sAUC comparison of E Erdem and A Erdem [17] and theproposed color based nonlinear integration

Dataset E Erdem andA Erdem [17]

Proposed Nonlinearrepresentation

Bruce and Tsotsos [16] 07023 07084Kootstra et al [34] 0603 06152

some information that belong to salient portion of the inputimage

For quantitative comparison we employ sAUC to dothe comparison between saliency maps based on a fixeddictionary and the adaptive basis dictionary Table 5 gives acomparison of the both approaches on two datasets Torontoand Kootstra The performance difference is quite significantand it shows that an adaptive representation is much betterBased on these qualitative and quantitative results we canconclude that an adaptive image presentation is more viableand accurate for CSD based saliency computation

A model proposed in [17] gives an idea of nonlinearintegration of all features by covariance matrices and thesupported implementation uses color gradient and spatialinformation Our second contribution is the modification ofthat model and a proposal of using only color with spatial

information in nonlinearly integrated manner using samecovariance matrices For our proposed implementation(seeFigure 4) we modified the features used in [17] and comparethe saliencymap with Erdemrsquos [17] model in Table 6 using thesAUC score Two databases Toronto [16] and Kootstra [34]are used for simulations and the results indicate that by usingonly color with spatial information we can get better sAUCscore than integrating all features using covariance matricesIn Table 6 the difference in sAUC score is quite visible onboth datasets One possible reason of this improvement maybe that the correlation among different features like color andorientation is different and thus using a covariance basedrepresentation does not capture the underlying informationstructure in an efficient way as compared to when only colorinformation is used

One possible argument against our usage of only colorinformation however can be that without any gradient ororientation information a saliency model will fail to detectmany salient regions This argument can also supported bynature since neurons tuned to orientation in an image areknown to contribute to saliency computation [53] In ourmodel this issue is addressed by the sparse representationwhere the adaptive basis same as basis shown in Figure 8(a)is Gabor like filters with edge like structure and thus thesebases efficiently capture orientation information from theimage which complements our color information in nonlin-ear representation

Finally the sAUC score of dual representation Table 3shows that we achieve better eye fixation prediction thantreating both representations separately as shown in Tables5 and 6We believe that such improvement is due to the com-plementary behavior of both techniques since a combinedapproach better represents image contents with high fidelityand thus that in turn improves saliency detection Lastly for

The Scientific World Journal 15

illustration and visual comparison we present some saliencymaps produced by our algorithm along with other models inFigure 9

6 Conclusion

This paper shows that dual feature representationmanages torobustly capture image information which can be used in acenter surround operation to compute saliencyWe show thata CSD on adaptive sparse basis gives better results than a fixsparse basis representation In nonlinear representation weshow that nonlinearly integrated color channels with spatialinformation better capture underlying data structure andthus a CSD on such representation gives good results Finallywe consider both representations as complementary and thusa fused saliencymap not only give good results on human eyefixations but also detect salient objects with high accuracy

In future wewill incorporate some top-downmechanismto better imitative human saliency computation capabilitiesbased on learning and experience Another possible exten-sion of the existing work is to test dynamic scenes video byincorporating additional motion information in the currentscheme

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the national Natural Sci-ence Foundation of China (61175096) and Specialized Fundfor Joint Building Program of Beijing Municipal EducationCommission

References

[1] C Siagian and L Itti ldquoBiologically inspired mobile robot visionlocalizationrdquo IEEE Transactions on Robotics vol 25 no 4 pp861ndash873 2009

[2] J Li M D Levine X An X Xu and H He ldquoVisual saliencybased on scale-space analysis in the frequency domainrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 4 pp 996ndash1010 2012

[3] Y Su Q Zhao L Zhao and D Gu ldquoAbrupt motion trackingusing a visual saliency embedded particle filterrdquo Pattern Recog-nition vol 47 no 5 pp 1826ndash1834 2014

[4] C Guo Q Ma and L Zhang ldquoSpatio-temporal saliency detec-tion using phase spectrum of quaternion fourier transformrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo08) pp 1ndash8 June 2008

[5] X Hou and L Zhang ldquoThumbnail generation based on globalsaliencyrdquo in Advances in Cognitive NeurodynamicsmdashICCN2007 pp 999ndash1003 Springer Amsterdam The Netherlands2007

[6] A Borji and L Itti ldquoState-of-the-art in visual attention mod-elingrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 35 no 1 pp 185ndash207 2013

[7] L Itti C Koch and E Niebur ldquoAmodel of saliency-based visualattention for rapid scene analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 20 no 11 pp 1254ndash12591998

[8] AM Treisman ldquoFeature integration theoryrdquoCognitive Psychol-ogy vol 12 no 1 pp 97ndash136 1980

[9] J K Tsotsos S M Culhane W Y Kei Wai Y Lai N Davisand F Nuflo ldquoModeling visual attention via selective tuningrdquoArtificial Intelligence vol 78 no 1-2 pp 507ndash545 1995

[10] R Rae Gestikbasierte Mensch-Maschine-Kommunikation aufder grundlage visueller aufmerksamkeit und adaptivitat [PhDthesis] Universitat Bielefeld 2000

[11] D Parkhurst K Law and E Niebur ldquoModeling the role ofsalience in the allocation of overt visual attentionrdquo VisionResearch vol 42 no 1 pp 107ndash123 2002

[12] J Li Y Tian T Huang and W Gao ldquoProbabilistic multi-tasklearning for visual saliency estimation in videordquo InternationalJournal of Computer Vision vol 90 no 2 pp 150ndash165 2010

[13] MCerf J HarelW Einhauser andCKoch ldquoPredicting humangaze using low-level saliency combined with face detectionrdquo inAdvances in Neural Information Processing Systems vol 20 pp241ndash248 2007

[14] A Torralba ldquoModeling global scene factors in attentionrdquoJournal of the Optical Society of America A Optics and ImageScience and Vision vol 20 no 7 pp 1407ndash1418 2003

[15] A Oliva and A Torralba ldquoModeling the shape of the scenea holistic representation of the spatial enveloperdquo InternationalJournal of Computer Vision vol 42 no 3 pp 145ndash175 2001

[16] N D B Bruce and J K Tsotsos ldquoSaliency attention and visualsearch an information theoretic approachrdquo Journal of Visionvol 9 no 3 article 5 2009

[17] E Erdem and A Erdem ldquoVisual saliency estimation by nonlin-early integrating features using region covariancesrdquo Journal ofVision vol 13 no 4 article 11 2013

[18] J H van Hateren ldquoReal and optimal neural images in earlyvisionrdquo Nature vol 360 no 6399 pp 68ndash70 1992

[19] D J Field ldquoRelations between the statistics of natural imagesand the response properties of cortical cellsrdquo Journal of theOptical Society of America A Optics and Image Science vol 4no 12 pp 2379ndash2394 1987

[20] B A Olshausen and D J Field ldquoNatural image statistics andefficient codingrdquo Network Computation in Neural Systems vol7 no 2 pp 333ndash339 1996

[21] M Kwon G Legge F Fang A Cheong and S He ldquoIdentifyingthe mechanism of adaptation to prolonged contrast reductionrdquoJournal of Vision vol 9 no 8 p 976 2009

[22] X Sun H Yao and R Ji ldquoVisual attention modeling based onshort-term environmental adaptionrdquo Journal of Visual Commu-nication and Image Representation vol 24 no 2 pp 171ndash1802013

[23] A Garcia-Diaz X R Fdez-Vidal X M Pardo and R DosilldquoSaliency from hierarchical adaptation through decorrelationand variance normalizationrdquo Image and Vision Computing vol30 no 1 pp 51ndash64 2012

[24] C E Shannon ldquoAmathematical theory of communicationrdquo BellSystem Technical Journal vol 27 no 3 pp 379ndash423 1948

[25] Y Karklin and M S Lewicki ldquoEmergence of complex cellproperties by learning to generalize in natural scenesrdquo Naturevol 457 no 7225 pp 83ndash86 2009

[26] H J Seo and P Milanfar ldquoStatic and space-time visual saliencydetection by self-resemblancerdquo Journal of Vision vol 9 no 12pp 1ndash27 2009

16 The Scientific World Journal

[27] D J Jobson Z-U Rahman and G AWoodell ldquoProperties andperformance of a centersurround retinexrdquo IEEE Transactionson Image Processing vol 6 no 3 pp 451ndash462 1997

[28] A Guo D Zhao S Liu X Fan and W Gao ldquoVisual attentionbased image quality assessmentrdquo in Proceedings of the 18thIEEE International Conference on Image Processing (ICIP rsquo11) pp3297ndash3300 September 2011

[29] A Borji and L Itti ldquoExploiting local and global patch raritiesfor saliency detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo12) pp 478ndash485 2012

[30] L Duan C Wu J Miao L Qing and Y Fu ldquoVisual saliencydetection by spatially weighted dissimilarityrdquo in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR rsquo11) pp 473ndash480 June 2011

[31] JMWolfe ldquoGuided search 20 a revisedmodel of visual searchrdquoPsychonomic Bulletin amp Review vol 1 no 2 pp 202ndash238 1994

[32] O Le Meur P Le Callet D Barba and DThoreau ldquoA coherentcomputational approach to model bottom-up visual attentionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 28 no 5 pp 802ndash817 2006

[33] A Torralba A Oliva M S Castelhano and J M HendersonldquoContextual guidance of eye movements and attention in real-world scenes the role of global features in object searchrdquoPsychological Review vol 113 no 4 pp 766ndash786 2006

[34] G Kootstra N Bergstrom and D Kragic ldquoUsing symmetry toselect fixation points for segmentationrdquo in Proceedings of the20th International Conference on Pattern Recognition (ICPR rsquo10)pp 3894ndash3897 August 2010

[35] C Li J Xue N Zheng X Lan and Z Tian ldquoSpatio-temporalsaliency perception via hypercomplex frequency spectral con-trastrdquo Sensors vol 13 no 3 pp 3409ndash3431 2013

[36] L Itti and P Baldi ldquoBayesian surprise attracts human attentionrdquoVision Research vol 49 no 10 pp 1295ndash1306 2009

[37] J Harel C Koch and P Perona ldquoGraph-based visual saliencyrdquoin Advances in Neural Information Processing Systems vol 19pp 545ndash552 2006

[38] X Hou and L Zhang ldquoSaliency detection a spectral residualapproachrdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo07)pp 1ndash8 June 2007

[39] B Schauerte and R Stiefelhagen ldquoQuaternion-based spectralsaliency detection for eye fixation predictionrdquo in ComputerVisionmdashECCV 2012 A Fitzgibbon S Lazebnik P Perona YSato and C Schmid Eds Lecture Notes in Computer Sciencepp 116ndash129 2012

[40] W Kienzle F Wichmann B Scholkopf and M Franz ANonparametric Approach to Bottom-Up Visual Saliency 2007

[41] J Tilke K Ehinger F Durand and A Torralba ldquoLearningto predict where humans lookrdquo in Proceedings of the 12thInternational Conference on Computer Vision (ICCV rsquo09) pp2106ndash2113 October 2009

[42] W Wang Y Wang Q Huang and W Gao ldquoMeasuring visualsaliency by site entropy raterdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 2368ndash2375 June 2010

[43] O Tuzel F Porikli and P Meer ldquoRegion covariance afast descriptor for detection and classificationrdquo in ComputerVisionmdashECCV 2006 vol 3952 of Lecture Notes in ComputerScience pp 589ndash600 2006

[44] httpssitesgooglecomsitesparsenonlinearsaliencymodelhomedownloads

[45] A Hyvarinen and E Oja ldquoA fast fixed-point algorithm forindependent component analysisrdquo Neural Computation vol 9no 7 pp 1483ndash1492 1997

[46] X Hou and L Zhang ldquoDynamic visual attention searching forcoding length incrementsrdquo in Advances in Neural InformationProcessing Systems vol 21 pp 681ndash688 2008

[47] L Zhang M H Tong T K Marks H Shan and G WCottrell ldquoSUN a Bayesian framework for saliency using naturalstatisticsrdquo Journal of Vision vol 8 no 7 article 32 pp 1ndash202008

[48] J Zhang and S Sclaroff ldquoSaliency detection a boolean mapapproachrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV rsquo13) 2013

[49] X Hou J Harel and C Koch ldquoImage signature highlightingsparse salient regionsrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 34 no 1 pp 194ndash201 2012

[50] E Rahtu J Kannala M Salo and J Heikkila ldquoSegmentingsalient objects from images and videosrdquo in Computer VisionmdashECCV 2010 vol 6315 of Lecture Notes in Computer Science pp366ndash379 2010

[51] M-M Cheng G-X Zhang N J Mitra X Huang and S-M Hu ldquoGlobal contrast based salient region detectionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo11) pp 409ndash416 June 2011

[52] T Veit J-P Tarel P Nicolle and P Charbonnier ldquoEvaluationof road marking feature extractionrdquo in Proceedings of the11th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo08) pp 174ndash181 December 2008

[53] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model of simple receptive fields in catstriate cortexrdquo Journal of Neurophysiology vol 58 no 6 pp1233ndash1258 1987

Page 8: Saliency Detection Using Sparse and Nonlinear …...Saliency Detection Using Sparse and Nonlinear Feature Representation ShahzadAnwar,1 QingjieZhao,1 MuhammadFarhanManzoor,2 andSaqibIshaqKhan3

8 The Scientific World Journal

Table 2 Various scales parameters which are used for evaluation of sparse and nonlinear representation

Scale Adaptive sparse representation Nonlinear representationResolution Patch size Resolution Patch size

1 40 times 30 5 128 times 128 2 4 8 16 32

2 80 times 60 5 256 times 256 4 8 16 32 64

3 160 times 120 5 512 times 512 8 16 32 64 128

08

07

06

1 2 3

Scale

sAU

C

Sparse representation

BruceKootstra

IMSAL

(a)

Non-linear representation

075

070

065

0601 2 3

Scale

sAU

C

BruceKootstra

IMSAL

(b)

Figure 5 sAUC score for sparse and nonlinear representation The values in (a) and (b) are maximum at scale 2 and scale 3 for all threedatasets

sparse representation the performance is maximum with 80times 60 pixels (scale 2) and for nonlinear representation we getgood performance at 512 times 512 pixels (scale 3) Usually theresolution of the input images is not very high so we restrictto 512 times 512 as upper bound for nonlinear representation andsame resolution with respective patch sizes is used in [17]The image resolution and patch sizes of different scales forthe both models used for evaluation are given in Table 2Based on this analysis we incorporate the parameters of scale2 and scale 3 for sparse and nonlinear representation in theproposed saliency model

Performance Comparison with Other Models The results ofour model along with comparison with 13 state-of-the-artmethods are given in Table 3 The detailed performancewith variable Gaussian smoothing is given in Figure 6 Thesimulation codes for all these methods are taken from theauthors websites We used multiscale and quaternion basedimplementation for spectral residue (MESR) PQFT [4] andDCT [49] as proposed by Schauerte and Stiefelhagen [39]which gives higher score than original methods Erdemrsquos [17]implementation with first order statistics embedded is usedfor simulation since it gives higher sAUC score Results withthe proposed technique are quite consistent and the proposedmethod outperforms the state-of-the-artΔQDCT [39]model

on the Toronto dataset and performs comparatively well onthe Kootstra dataset and IMSAL dataset No single modelperforms well on all these datasets and the performanceof other models changes with the dataset but our modelshows consistency and remained either ranked 1 or ranked2 on these datasets We believe that the high eye fixationprediction is due to the adaptive nature of the model anddue to dual representation of features The adaptivity makesa feature space optimal for the current image thus a moreaccurate representation of the features is possible which inturn accounts for better saliency map estimation Moreovera single representation may not be enough for every caseFinally these results can improve further if we use a multi-scale representation for ICA based representation which weskipped due to computational time constraints

43 Salient Object Detection A saliency map can be usedto detect a salient object in an image The basic premiseis that if an image consists of an object which stands outfrom the rest of the image then it should be identifiedby a saliency algorithm There is a different branch invisual saliency modeling which consists of models that arespecifically designed to detect salient objects These modelsfind the salient object in an image and then segment thewhole

The Scientific World Journal 9

000 002 004 006 008

120590

072

070

068

066

064

062

060

Toronto dataset

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFTLGERDEMQDCTAWSProposed

sAU

C

010 012 014

(a)

062

060

058

056

054

052

Kootstra dataset

000 002 004 006 008

120590

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFTLGERDEMQDCTAWSProposed

sAU

C

010 012 014

(b)

078

076

074

072

070

068

066

IMSAL dataset

000 002 004 006 008

120590

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFT

ERDEMQDCTAWSProposed

sAU

C

LG

010 012 014 016

(c)

Figure 6 Toronto Kootstra and IMSAL datasets with variable Gaussian smoothing for all algorithms in comparison The 119909-axis representsthe 120590 of the smoothing Gaussian (in image width) (In (c) only GBVS LG and ΔQDCT are taken from [48])

10 The Scientific World Journal

Table3Com

paris

onof

sAUCscoreo

fthe

prop

osed

mod

elwith

13otherstate-of-the-artmod

els

Dataset

AIM

[16]

GBV

S[37]

MES

R[39]

ICL[46]

Itti[37]

MPQ

FT[39]

SDSR

[26]

SUN[47]

HFT

[2]

LG[29]

ERDM

[17]

ΔQDCT

[39]

AWS[23]

Prop

osed

TORO

NTO

[16]

0699

064

0407141

064

76064

8107119

07062

066

6106915

06939

07023

07172

07130

07179

Optim

al120590

004

001

005

003

001

004

003

003

006

004

004

005

001

004

KOOTS

TRA[34]

05901

05572

05978

05769

05689

05985

05984

05613

05911

05938

0603

05998

06185

06157

Optim

al120590

003

001

004

001

004

005

004

002

005

001

003

003

001

003

IMSA

L[2]

07424

lowast076

6507153

07364

07512

071694

07403

06955

07419lowast07356

07455

lowast07434

07468

07560

Optim

al120590

008

007

015

013

009

014

013

016

009

012

012

010

005

009

Thetop

rank

edmod

elisin

boldfont

andsecond

rank

edmod

elisin

italic

fontH

erer

esultswith

optim

alaverages

AUCof

each

metho

dwith

thec

orrespon

ding

Gaussianblur(120590)are

repo

rtedA

sin[47]w

erepeat

thes

AUCcalculationfor2

0tim

esandcompu

tethes

tand

arddeviationof

each

averages

AUC

which

ranges

from1119864minus4to5119864minus4(lowastvalues

takenfro

m[48])

The Scientific World Journal 11

Table4Com

paris

onof

AUCandPo

DSC

scoreo

fthe

prop

osed

mod

elwith

otherstate-of-the-artmod

els

Algorith

mC1

C2C3

C4C5

C6Av

gAU

CPo

DSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AvgAU

CAIM

0921

0653

0918

0543

0957

0461

0904

0435

0924

0508

0941

0671

0927

MES

R0798

0494

0854

0436

0927

0377

0718

0238

0837

0398

0918

0617

0842

ICL

0928

0681

0907

0501

0909

0348

0926

0537

0902

0498

0914

0569

0914

Itti

0939

0687

0924

0542

0952

0457

0882

0423

0924

0499

0941

0631

0927

MPQ

FT0805

0497

0865

0452

0927

0386

0735

0248

0852

0436

0921

0630

0851

SDSR

0879

0618

0908

0511

0948

0435

0801

0298

0904

0489

0933

064

50896

SUN

0766

0459

0787

0349

0880

0362

0708

0263

0719

0283

0817

0451

0780

HFT

0937

070

00923

0552

0937

044

8096

4064

80933

0554

096

0070

60942

LG0814

0515

0859

046

40904

0379

0635

0145

0862

0499

0884

0572

0826

ERDEM

0882

0617

0912

0555

0962

0526

0840

0349

0923

0575

0916

0635

0906

MQDCT

0840

0564

0894

0517

0947

0489

0806

0311

0888

0524

0923

0670

0883

AWS

0894

0625

0922

0569

0957

0480

0891

0402

0920

0534

0925

0655

0918

Prop

osed

0920

0653

094

3059

60976

0555

0934

0516

094

20532

0956

0688

094

5Th

etop

rank

edmod

elisin

bold

font

andthe2

ndrank

edmod

elisin

italic

fontTh

eoverallbestaverageA

UCscoreisg

iven

byou

rpropo

sedmod

el

12 The Scientific World Journal

Input Proposed AIM GBVS MESR ICL Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 7 Response of our algorithm and other 13 state-of-the-art algorithms on various psychological patterns

extent of the object and thus solve this task like segmentation-type binary labeling formulation [50 51] In contrast ourmodel is designed for location based (eye fixation) saliencymodeling and it is not designed to capture exact objectboundaries however by thresholding a saliency map we canget a binary map that can be used for testing performance ofa model for salient object detection Since our saliencymodelis location based we only use other location based modelsin comparison for a fair evaluation similar convention isfollowed in [2 17]

Dataset and Metric for Evaluation For salient object detec-tion the metric used by Li et al [2] is area under thecurve (AUC) and dice similarity coefficient (DSC) on IMSAL[2] dataset We will use the same metric and dataset forsalient object detectionThe DSC gives the overlap between athreshold saliency map and the ground truth Moreover thepeak value of the DSC [52] is considered an important way toestablish the best performance of an algorithm at an optimalthreshold thus we also give results with peak value of theDSC curve (PoDSC) Since AUC can be influenced by centerbias for fair comparison we turn off the center bias in allthe algorithmsThe GBVS [37] has built-in center bias and inorder to cater for that the author in [2] incorporates explicitcenter bias and shows that HFT [2] performs better thanGBVS on the same dataset which is also used in our paperWe do not employ center bias explicitly or implicitly in thepresented results and addedHFT instead for our comparisonTherefore GVBS [37] is skipped from Table 4 Furthermorewe perform Gaussian smoothing in all the algorithms tofind the optimal smoothing parameters for each class in thedataset and the optimal performance is quoted in the resultsgiven in Table 4

Performance We present results in comparison with other 12state-of-the-art algorithms The complete results are given in

Table 4 Our proposed scheme gives the best performanceon three categories C2 C3 and C5 and ranked secondon C4 and C6 Our model gives the highest average AUCscore on this dataset On different categories our results arecomparative to the HFT which is state of the art on thisdataset Apart fromHFT the performance is also significantlybetter in comparison to other algorithmsThedataset used forcomparison is quite complex but our algorithm performedwell for intermediate small objects with distracters althoughperformance on other cases is little less than other state-of-the-art algorithms

44 Psychological Patterns Wealso tested our saliencymodelon psychological patterns which are commonly used to givea qualitative performance on artificial scenarios which simu-late a pop-up phenomenonThese patterns simulate the pop-out phenomenon based on color intersection symmetryorientation curvature and candles image In order to checkthe general performance on various psychological tasks wetested the proposed model on eight psychological patternsFigure 7 gives the results of the proposed saliency modelalong with other popular models The proposed algorithmworks well on color symmetry and orientation as well as oncandle image but the performance is not good for curvatureand intersection patterns Figure 7 also shows that any singlemodel does not give good performance on all the patterns andthe best performance is more or less the same as given by theproposed scheme

5 Discussion

The image features representation drastically affects theinformation content and thus saliency estimation We useda bioinspired center surround saliency computation on twoparallel feature representations that give good performance

The Scientific World Journal 13

(a)

(b) (c)

(d) (e)

Figure 8 The fixed learned dictionary and the information lost by using such dictionary (a) Fixed learned dictionary ((b)-(c)) The inputimages ((d)-(e)) The information lost by using dictionary given in (a)

on both eye fixation prediction and salient object detectionSince a CSD operation depends on the difference between acenter portion and its surroundings a better image contentsrepresentation makes a more accurate and precise centersurround operation

We used an adaptive sparse representation to boost theperformance of the CSD operation In order to show theeffectiveness of the proposed approach we present bothquantitative and qualitative results For qualitative compar-ison we use a fix dictionary [29] (Figure 8(a)) learnt from

an ensemble of natural images We show that some of theinformation is lost if we use a fixed dictionary to represent animage and usually the lost information belongs to the salientregion in an image The difference between an input imageand the image reconstructed by a fixed dictionary is givenin Figure 8 The red cylinder Figure 8(b) and red box withtext Figure 8(c) is visible in the image plots Figures 8(d)and 8(e) These two objects are the salient features in bothimages and their appearance in the residual image shows thatcurrent representation which uses a fixed dictionary loses

14 The Scientific World Journal

Input Gnd truth Proposed AIM GBVS Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 9 Comparison of our method with other state-of-the-art saliency models The first column is the input image with second columnbeing ground truth followed by our proposed saliency model Results from all the other models are plotted in the same row

Table 5 sAUC score comparison using fixed [29] and adaptivesparse representation (proposed) for CSD saliency

DatasetBorji local [29] Fixeddictionary sparserepresentation

Proposed Adaptivedictionary sparserepresentation

Toronto 0670 07110Kootstra 0572 06061

Table 6 sAUC comparison of E Erdem and A Erdem [17] and theproposed color based nonlinear integration

Dataset E Erdem andA Erdem [17]

Proposed Nonlinearrepresentation

Bruce and Tsotsos [16] 07023 07084Kootstra et al [34] 0603 06152

some information that belong to salient portion of the inputimage

For quantitative comparison we employ sAUC to dothe comparison between saliency maps based on a fixeddictionary and the adaptive basis dictionary Table 5 gives acomparison of the both approaches on two datasets Torontoand Kootstra The performance difference is quite significantand it shows that an adaptive representation is much betterBased on these qualitative and quantitative results we canconclude that an adaptive image presentation is more viableand accurate for CSD based saliency computation

A model proposed in [17] gives an idea of nonlinearintegration of all features by covariance matrices and thesupported implementation uses color gradient and spatialinformation Our second contribution is the modification ofthat model and a proposal of using only color with spatial

information in nonlinearly integrated manner using samecovariance matrices For our proposed implementation(seeFigure 4) we modified the features used in [17] and comparethe saliencymap with Erdemrsquos [17] model in Table 6 using thesAUC score Two databases Toronto [16] and Kootstra [34]are used for simulations and the results indicate that by usingonly color with spatial information we can get better sAUCscore than integrating all features using covariance matricesIn Table 6 the difference in sAUC score is quite visible onboth datasets One possible reason of this improvement maybe that the correlation among different features like color andorientation is different and thus using a covariance basedrepresentation does not capture the underlying informationstructure in an efficient way as compared to when only colorinformation is used

One possible argument against our usage of only colorinformation however can be that without any gradient ororientation information a saliency model will fail to detectmany salient regions This argument can also supported bynature since neurons tuned to orientation in an image areknown to contribute to saliency computation [53] In ourmodel this issue is addressed by the sparse representationwhere the adaptive basis same as basis shown in Figure 8(a)is Gabor like filters with edge like structure and thus thesebases efficiently capture orientation information from theimage which complements our color information in nonlin-ear representation

Finally the sAUC score of dual representation Table 3shows that we achieve better eye fixation prediction thantreating both representations separately as shown in Tables5 and 6We believe that such improvement is due to the com-plementary behavior of both techniques since a combinedapproach better represents image contents with high fidelityand thus that in turn improves saliency detection Lastly for

The Scientific World Journal 15

illustration and visual comparison we present some saliencymaps produced by our algorithm along with other models inFigure 9

6 Conclusion

This paper shows that dual feature representationmanages torobustly capture image information which can be used in acenter surround operation to compute saliencyWe show thata CSD on adaptive sparse basis gives better results than a fixsparse basis representation In nonlinear representation weshow that nonlinearly integrated color channels with spatialinformation better capture underlying data structure andthus a CSD on such representation gives good results Finallywe consider both representations as complementary and thusa fused saliencymap not only give good results on human eyefixations but also detect salient objects with high accuracy

In future wewill incorporate some top-downmechanismto better imitative human saliency computation capabilitiesbased on learning and experience Another possible exten-sion of the existing work is to test dynamic scenes video byincorporating additional motion information in the currentscheme

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the national Natural Sci-ence Foundation of China (61175096) and Specialized Fundfor Joint Building Program of Beijing Municipal EducationCommission

References

[1] C Siagian and L Itti ldquoBiologically inspired mobile robot visionlocalizationrdquo IEEE Transactions on Robotics vol 25 no 4 pp861ndash873 2009

[2] J Li M D Levine X An X Xu and H He ldquoVisual saliencybased on scale-space analysis in the frequency domainrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 4 pp 996ndash1010 2012

[3] Y Su Q Zhao L Zhao and D Gu ldquoAbrupt motion trackingusing a visual saliency embedded particle filterrdquo Pattern Recog-nition vol 47 no 5 pp 1826ndash1834 2014

[4] C Guo Q Ma and L Zhang ldquoSpatio-temporal saliency detec-tion using phase spectrum of quaternion fourier transformrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo08) pp 1ndash8 June 2008

[5] X Hou and L Zhang ldquoThumbnail generation based on globalsaliencyrdquo in Advances in Cognitive NeurodynamicsmdashICCN2007 pp 999ndash1003 Springer Amsterdam The Netherlands2007

[6] A Borji and L Itti ldquoState-of-the-art in visual attention mod-elingrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 35 no 1 pp 185ndash207 2013

[7] L Itti C Koch and E Niebur ldquoAmodel of saliency-based visualattention for rapid scene analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 20 no 11 pp 1254ndash12591998

[8] AM Treisman ldquoFeature integration theoryrdquoCognitive Psychol-ogy vol 12 no 1 pp 97ndash136 1980

[9] J K Tsotsos S M Culhane W Y Kei Wai Y Lai N Davisand F Nuflo ldquoModeling visual attention via selective tuningrdquoArtificial Intelligence vol 78 no 1-2 pp 507ndash545 1995

[10] R Rae Gestikbasierte Mensch-Maschine-Kommunikation aufder grundlage visueller aufmerksamkeit und adaptivitat [PhDthesis] Universitat Bielefeld 2000

[11] D Parkhurst K Law and E Niebur ldquoModeling the role ofsalience in the allocation of overt visual attentionrdquo VisionResearch vol 42 no 1 pp 107ndash123 2002

[12] J Li Y Tian T Huang and W Gao ldquoProbabilistic multi-tasklearning for visual saliency estimation in videordquo InternationalJournal of Computer Vision vol 90 no 2 pp 150ndash165 2010

[13] MCerf J HarelW Einhauser andCKoch ldquoPredicting humangaze using low-level saliency combined with face detectionrdquo inAdvances in Neural Information Processing Systems vol 20 pp241ndash248 2007

[14] A Torralba ldquoModeling global scene factors in attentionrdquoJournal of the Optical Society of America A Optics and ImageScience and Vision vol 20 no 7 pp 1407ndash1418 2003

[15] A Oliva and A Torralba ldquoModeling the shape of the scenea holistic representation of the spatial enveloperdquo InternationalJournal of Computer Vision vol 42 no 3 pp 145ndash175 2001

[16] N D B Bruce and J K Tsotsos ldquoSaliency attention and visualsearch an information theoretic approachrdquo Journal of Visionvol 9 no 3 article 5 2009

[17] E Erdem and A Erdem ldquoVisual saliency estimation by nonlin-early integrating features using region covariancesrdquo Journal ofVision vol 13 no 4 article 11 2013

[18] J H van Hateren ldquoReal and optimal neural images in earlyvisionrdquo Nature vol 360 no 6399 pp 68ndash70 1992

[19] D J Field ldquoRelations between the statistics of natural imagesand the response properties of cortical cellsrdquo Journal of theOptical Society of America A Optics and Image Science vol 4no 12 pp 2379ndash2394 1987

[20] B A Olshausen and D J Field ldquoNatural image statistics andefficient codingrdquo Network Computation in Neural Systems vol7 no 2 pp 333ndash339 1996

[21] M Kwon G Legge F Fang A Cheong and S He ldquoIdentifyingthe mechanism of adaptation to prolonged contrast reductionrdquoJournal of Vision vol 9 no 8 p 976 2009

[22] X Sun H Yao and R Ji ldquoVisual attention modeling based onshort-term environmental adaptionrdquo Journal of Visual Commu-nication and Image Representation vol 24 no 2 pp 171ndash1802013

[23] A Garcia-Diaz X R Fdez-Vidal X M Pardo and R DosilldquoSaliency from hierarchical adaptation through decorrelationand variance normalizationrdquo Image and Vision Computing vol30 no 1 pp 51ndash64 2012

[24] C E Shannon ldquoAmathematical theory of communicationrdquo BellSystem Technical Journal vol 27 no 3 pp 379ndash423 1948

[25] Y Karklin and M S Lewicki ldquoEmergence of complex cellproperties by learning to generalize in natural scenesrdquo Naturevol 457 no 7225 pp 83ndash86 2009

[26] H J Seo and P Milanfar ldquoStatic and space-time visual saliencydetection by self-resemblancerdquo Journal of Vision vol 9 no 12pp 1ndash27 2009

16 The Scientific World Journal

[27] D J Jobson Z-U Rahman and G AWoodell ldquoProperties andperformance of a centersurround retinexrdquo IEEE Transactionson Image Processing vol 6 no 3 pp 451ndash462 1997

[28] A Guo D Zhao S Liu X Fan and W Gao ldquoVisual attentionbased image quality assessmentrdquo in Proceedings of the 18thIEEE International Conference on Image Processing (ICIP rsquo11) pp3297ndash3300 September 2011

[29] A Borji and L Itti ldquoExploiting local and global patch raritiesfor saliency detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo12) pp 478ndash485 2012

[30] L Duan C Wu J Miao L Qing and Y Fu ldquoVisual saliencydetection by spatially weighted dissimilarityrdquo in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR rsquo11) pp 473ndash480 June 2011

[31] JMWolfe ldquoGuided search 20 a revisedmodel of visual searchrdquoPsychonomic Bulletin amp Review vol 1 no 2 pp 202ndash238 1994

[32] O Le Meur P Le Callet D Barba and DThoreau ldquoA coherentcomputational approach to model bottom-up visual attentionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 28 no 5 pp 802ndash817 2006

[33] A Torralba A Oliva M S Castelhano and J M HendersonldquoContextual guidance of eye movements and attention in real-world scenes the role of global features in object searchrdquoPsychological Review vol 113 no 4 pp 766ndash786 2006

[34] G Kootstra N Bergstrom and D Kragic ldquoUsing symmetry toselect fixation points for segmentationrdquo in Proceedings of the20th International Conference on Pattern Recognition (ICPR rsquo10)pp 3894ndash3897 August 2010

[35] C Li J Xue N Zheng X Lan and Z Tian ldquoSpatio-temporalsaliency perception via hypercomplex frequency spectral con-trastrdquo Sensors vol 13 no 3 pp 3409ndash3431 2013

[36] L Itti and P Baldi ldquoBayesian surprise attracts human attentionrdquoVision Research vol 49 no 10 pp 1295ndash1306 2009

[37] J Harel C Koch and P Perona ldquoGraph-based visual saliencyrdquoin Advances in Neural Information Processing Systems vol 19pp 545ndash552 2006

[38] X Hou and L Zhang ldquoSaliency detection a spectral residualapproachrdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo07)pp 1ndash8 June 2007

[39] B Schauerte and R Stiefelhagen ldquoQuaternion-based spectralsaliency detection for eye fixation predictionrdquo in ComputerVisionmdashECCV 2012 A Fitzgibbon S Lazebnik P Perona YSato and C Schmid Eds Lecture Notes in Computer Sciencepp 116ndash129 2012

[40] W Kienzle F Wichmann B Scholkopf and M Franz ANonparametric Approach to Bottom-Up Visual Saliency 2007

[41] J Tilke K Ehinger F Durand and A Torralba ldquoLearningto predict where humans lookrdquo in Proceedings of the 12thInternational Conference on Computer Vision (ICCV rsquo09) pp2106ndash2113 October 2009

[42] W Wang Y Wang Q Huang and W Gao ldquoMeasuring visualsaliency by site entropy raterdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 2368ndash2375 June 2010

[43] O Tuzel F Porikli and P Meer ldquoRegion covariance afast descriptor for detection and classificationrdquo in ComputerVisionmdashECCV 2006 vol 3952 of Lecture Notes in ComputerScience pp 589ndash600 2006

[44] httpssitesgooglecomsitesparsenonlinearsaliencymodelhomedownloads

[45] A Hyvarinen and E Oja ldquoA fast fixed-point algorithm forindependent component analysisrdquo Neural Computation vol 9no 7 pp 1483ndash1492 1997

[46] X Hou and L Zhang ldquoDynamic visual attention searching forcoding length incrementsrdquo in Advances in Neural InformationProcessing Systems vol 21 pp 681ndash688 2008

[47] L Zhang M H Tong T K Marks H Shan and G WCottrell ldquoSUN a Bayesian framework for saliency using naturalstatisticsrdquo Journal of Vision vol 8 no 7 article 32 pp 1ndash202008

[48] J Zhang and S Sclaroff ldquoSaliency detection a boolean mapapproachrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV rsquo13) 2013

[49] X Hou J Harel and C Koch ldquoImage signature highlightingsparse salient regionsrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 34 no 1 pp 194ndash201 2012

[50] E Rahtu J Kannala M Salo and J Heikkila ldquoSegmentingsalient objects from images and videosrdquo in Computer VisionmdashECCV 2010 vol 6315 of Lecture Notes in Computer Science pp366ndash379 2010

[51] M-M Cheng G-X Zhang N J Mitra X Huang and S-M Hu ldquoGlobal contrast based salient region detectionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo11) pp 409ndash416 June 2011

[52] T Veit J-P Tarel P Nicolle and P Charbonnier ldquoEvaluationof road marking feature extractionrdquo in Proceedings of the11th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo08) pp 174ndash181 December 2008

[53] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model of simple receptive fields in catstriate cortexrdquo Journal of Neurophysiology vol 58 no 6 pp1233ndash1258 1987

Page 9: Saliency Detection Using Sparse and Nonlinear …...Saliency Detection Using Sparse and Nonlinear Feature Representation ShahzadAnwar,1 QingjieZhao,1 MuhammadFarhanManzoor,2 andSaqibIshaqKhan3

The Scientific World Journal 9

000 002 004 006 008

120590

072

070

068

066

064

062

060

Toronto dataset

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFTLGERDEMQDCTAWSProposed

sAU

C

010 012 014

(a)

062

060

058

056

054

052

Kootstra dataset

000 002 004 006 008

120590

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFTLGERDEMQDCTAWSProposed

sAU

C

010 012 014

(b)

078

076

074

072

070

068

066

IMSAL dataset

000 002 004 006 008

120590

AIMGBVSMESRICLIttiMPQFTSDSR

SUNHFT

ERDEMQDCTAWSProposed

sAU

C

LG

010 012 014 016

(c)

Figure 6 Toronto Kootstra and IMSAL datasets with variable Gaussian smoothing for all algorithms in comparison The 119909-axis representsthe 120590 of the smoothing Gaussian (in image width) (In (c) only GBVS LG and ΔQDCT are taken from [48])

10 The Scientific World Journal

Table3Com

paris

onof

sAUCscoreo

fthe

prop

osed

mod

elwith

13otherstate-of-the-artmod

els

Dataset

AIM

[16]

GBV

S[37]

MES

R[39]

ICL[46]

Itti[37]

MPQ

FT[39]

SDSR

[26]

SUN[47]

HFT

[2]

LG[29]

ERDM

[17]

ΔQDCT

[39]

AWS[23]

Prop

osed

TORO

NTO

[16]

0699

064

0407141

064

76064

8107119

07062

066

6106915

06939

07023

07172

07130

07179

Optim

al120590

004

001

005

003

001

004

003

003

006

004

004

005

001

004

KOOTS

TRA[34]

05901

05572

05978

05769

05689

05985

05984

05613

05911

05938

0603

05998

06185

06157

Optim

al120590

003

001

004

001

004

005

004

002

005

001

003

003

001

003

IMSA

L[2]

07424

lowast076

6507153

07364

07512

071694

07403

06955

07419lowast07356

07455

lowast07434

07468

07560

Optim

al120590

008

007

015

013

009

014

013

016

009

012

012

010

005

009

Thetop

rank

edmod

elisin

boldfont

andsecond

rank

edmod

elisin

italic

fontH

erer

esultswith

optim

alaverages

AUCof

each

metho

dwith

thec

orrespon

ding

Gaussianblur(120590)are

repo

rtedA

sin[47]w

erepeat

thes

AUCcalculationfor2

0tim

esandcompu

tethes

tand

arddeviationof

each

averages

AUC

which

ranges

from1119864minus4to5119864minus4(lowastvalues

takenfro

m[48])

The Scientific World Journal 11

Table4Com

paris

onof

AUCandPo

DSC

scoreo

fthe

prop

osed

mod

elwith

otherstate-of-the-artmod

els

Algorith

mC1

C2C3

C4C5

C6Av

gAU

CPo

DSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AvgAU

CAIM

0921

0653

0918

0543

0957

0461

0904

0435

0924

0508

0941

0671

0927

MES

R0798

0494

0854

0436

0927

0377

0718

0238

0837

0398

0918

0617

0842

ICL

0928

0681

0907

0501

0909

0348

0926

0537

0902

0498

0914

0569

0914

Itti

0939

0687

0924

0542

0952

0457

0882

0423

0924

0499

0941

0631

0927

MPQ

FT0805

0497

0865

0452

0927

0386

0735

0248

0852

0436

0921

0630

0851

SDSR

0879

0618

0908

0511

0948

0435

0801

0298

0904

0489

0933

064

50896

SUN

0766

0459

0787

0349

0880

0362

0708

0263

0719

0283

0817

0451

0780

HFT

0937

070

00923

0552

0937

044

8096

4064

80933

0554

096

0070

60942

LG0814

0515

0859

046

40904

0379

0635

0145

0862

0499

0884

0572

0826

ERDEM

0882

0617

0912

0555

0962

0526

0840

0349

0923

0575

0916

0635

0906

MQDCT

0840

0564

0894

0517

0947

0489

0806

0311

0888

0524

0923

0670

0883

AWS

0894

0625

0922

0569

0957

0480

0891

0402

0920

0534

0925

0655

0918

Prop

osed

0920

0653

094

3059

60976

0555

0934

0516

094

20532

0956

0688

094

5Th

etop

rank

edmod

elisin

bold

font

andthe2

ndrank

edmod

elisin

italic

fontTh

eoverallbestaverageA

UCscoreisg

iven

byou

rpropo

sedmod

el

12 The Scientific World Journal

Input Proposed AIM GBVS MESR ICL Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 7 Response of our algorithm and other 13 state-of-the-art algorithms on various psychological patterns

extent of the object and thus solve this task like segmentation-type binary labeling formulation [50 51] In contrast ourmodel is designed for location based (eye fixation) saliencymodeling and it is not designed to capture exact objectboundaries however by thresholding a saliency map we canget a binary map that can be used for testing performance ofa model for salient object detection Since our saliencymodelis location based we only use other location based modelsin comparison for a fair evaluation similar convention isfollowed in [2 17]

Dataset and Metric for Evaluation For salient object detec-tion the metric used by Li et al [2] is area under thecurve (AUC) and dice similarity coefficient (DSC) on IMSAL[2] dataset We will use the same metric and dataset forsalient object detectionThe DSC gives the overlap between athreshold saliency map and the ground truth Moreover thepeak value of the DSC [52] is considered an important way toestablish the best performance of an algorithm at an optimalthreshold thus we also give results with peak value of theDSC curve (PoDSC) Since AUC can be influenced by centerbias for fair comparison we turn off the center bias in allthe algorithmsThe GBVS [37] has built-in center bias and inorder to cater for that the author in [2] incorporates explicitcenter bias and shows that HFT [2] performs better thanGBVS on the same dataset which is also used in our paperWe do not employ center bias explicitly or implicitly in thepresented results and addedHFT instead for our comparisonTherefore GVBS [37] is skipped from Table 4 Furthermorewe perform Gaussian smoothing in all the algorithms tofind the optimal smoothing parameters for each class in thedataset and the optimal performance is quoted in the resultsgiven in Table 4

Performance We present results in comparison with other 12state-of-the-art algorithms The complete results are given in

Table 4 Our proposed scheme gives the best performanceon three categories C2 C3 and C5 and ranked secondon C4 and C6 Our model gives the highest average AUCscore on this dataset On different categories our results arecomparative to the HFT which is state of the art on thisdataset Apart fromHFT the performance is also significantlybetter in comparison to other algorithmsThedataset used forcomparison is quite complex but our algorithm performedwell for intermediate small objects with distracters althoughperformance on other cases is little less than other state-of-the-art algorithms

44 Psychological Patterns Wealso tested our saliencymodelon psychological patterns which are commonly used to givea qualitative performance on artificial scenarios which simu-late a pop-up phenomenonThese patterns simulate the pop-out phenomenon based on color intersection symmetryorientation curvature and candles image In order to checkthe general performance on various psychological tasks wetested the proposed model on eight psychological patternsFigure 7 gives the results of the proposed saliency modelalong with other popular models The proposed algorithmworks well on color symmetry and orientation as well as oncandle image but the performance is not good for curvatureand intersection patterns Figure 7 also shows that any singlemodel does not give good performance on all the patterns andthe best performance is more or less the same as given by theproposed scheme

5 Discussion

The image features representation drastically affects theinformation content and thus saliency estimation We useda bioinspired center surround saliency computation on twoparallel feature representations that give good performance

The Scientific World Journal 13

(a)

(b) (c)

(d) (e)

Figure 8 The fixed learned dictionary and the information lost by using such dictionary (a) Fixed learned dictionary ((b)-(c)) The inputimages ((d)-(e)) The information lost by using dictionary given in (a)

on both eye fixation prediction and salient object detectionSince a CSD operation depends on the difference between acenter portion and its surroundings a better image contentsrepresentation makes a more accurate and precise centersurround operation

We used an adaptive sparse representation to boost theperformance of the CSD operation In order to show theeffectiveness of the proposed approach we present bothquantitative and qualitative results For qualitative compar-ison we use a fix dictionary [29] (Figure 8(a)) learnt from

an ensemble of natural images We show that some of theinformation is lost if we use a fixed dictionary to represent animage and usually the lost information belongs to the salientregion in an image The difference between an input imageand the image reconstructed by a fixed dictionary is givenin Figure 8 The red cylinder Figure 8(b) and red box withtext Figure 8(c) is visible in the image plots Figures 8(d)and 8(e) These two objects are the salient features in bothimages and their appearance in the residual image shows thatcurrent representation which uses a fixed dictionary loses

14 The Scientific World Journal

Input Gnd truth Proposed AIM GBVS Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 9 Comparison of our method with other state-of-the-art saliency models The first column is the input image with second columnbeing ground truth followed by our proposed saliency model Results from all the other models are plotted in the same row

Table 5 sAUC score comparison using fixed [29] and adaptivesparse representation (proposed) for CSD saliency

DatasetBorji local [29] Fixeddictionary sparserepresentation

Proposed Adaptivedictionary sparserepresentation

Toronto 0670 07110Kootstra 0572 06061

Table 6 sAUC comparison of E Erdem and A Erdem [17] and theproposed color based nonlinear integration

Dataset E Erdem andA Erdem [17]

Proposed Nonlinearrepresentation

Bruce and Tsotsos [16] 07023 07084Kootstra et al [34] 0603 06152

some information that belong to salient portion of the inputimage

For quantitative comparison we employ sAUC to dothe comparison between saliency maps based on a fixeddictionary and the adaptive basis dictionary Table 5 gives acomparison of the both approaches on two datasets Torontoand Kootstra The performance difference is quite significantand it shows that an adaptive representation is much betterBased on these qualitative and quantitative results we canconclude that an adaptive image presentation is more viableand accurate for CSD based saliency computation

A model proposed in [17] gives an idea of nonlinearintegration of all features by covariance matrices and thesupported implementation uses color gradient and spatialinformation Our second contribution is the modification ofthat model and a proposal of using only color with spatial

information in nonlinearly integrated manner using samecovariance matrices For our proposed implementation(seeFigure 4) we modified the features used in [17] and comparethe saliencymap with Erdemrsquos [17] model in Table 6 using thesAUC score Two databases Toronto [16] and Kootstra [34]are used for simulations and the results indicate that by usingonly color with spatial information we can get better sAUCscore than integrating all features using covariance matricesIn Table 6 the difference in sAUC score is quite visible onboth datasets One possible reason of this improvement maybe that the correlation among different features like color andorientation is different and thus using a covariance basedrepresentation does not capture the underlying informationstructure in an efficient way as compared to when only colorinformation is used

One possible argument against our usage of only colorinformation however can be that without any gradient ororientation information a saliency model will fail to detectmany salient regions This argument can also supported bynature since neurons tuned to orientation in an image areknown to contribute to saliency computation [53] In ourmodel this issue is addressed by the sparse representationwhere the adaptive basis same as basis shown in Figure 8(a)is Gabor like filters with edge like structure and thus thesebases efficiently capture orientation information from theimage which complements our color information in nonlin-ear representation

Finally the sAUC score of dual representation Table 3shows that we achieve better eye fixation prediction thantreating both representations separately as shown in Tables5 and 6We believe that such improvement is due to the com-plementary behavior of both techniques since a combinedapproach better represents image contents with high fidelityand thus that in turn improves saliency detection Lastly for

The Scientific World Journal 15

illustration and visual comparison we present some saliencymaps produced by our algorithm along with other models inFigure 9

6 Conclusion

This paper shows that dual feature representationmanages torobustly capture image information which can be used in acenter surround operation to compute saliencyWe show thata CSD on adaptive sparse basis gives better results than a fixsparse basis representation In nonlinear representation weshow that nonlinearly integrated color channels with spatialinformation better capture underlying data structure andthus a CSD on such representation gives good results Finallywe consider both representations as complementary and thusa fused saliencymap not only give good results on human eyefixations but also detect salient objects with high accuracy

In future wewill incorporate some top-downmechanismto better imitative human saliency computation capabilitiesbased on learning and experience Another possible exten-sion of the existing work is to test dynamic scenes video byincorporating additional motion information in the currentscheme

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the national Natural Sci-ence Foundation of China (61175096) and Specialized Fundfor Joint Building Program of Beijing Municipal EducationCommission

References

[1] C Siagian and L Itti ldquoBiologically inspired mobile robot visionlocalizationrdquo IEEE Transactions on Robotics vol 25 no 4 pp861ndash873 2009

[2] J Li M D Levine X An X Xu and H He ldquoVisual saliencybased on scale-space analysis in the frequency domainrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 4 pp 996ndash1010 2012

[3] Y Su Q Zhao L Zhao and D Gu ldquoAbrupt motion trackingusing a visual saliency embedded particle filterrdquo Pattern Recog-nition vol 47 no 5 pp 1826ndash1834 2014

[4] C Guo Q Ma and L Zhang ldquoSpatio-temporal saliency detec-tion using phase spectrum of quaternion fourier transformrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo08) pp 1ndash8 June 2008

[5] X Hou and L Zhang ldquoThumbnail generation based on globalsaliencyrdquo in Advances in Cognitive NeurodynamicsmdashICCN2007 pp 999ndash1003 Springer Amsterdam The Netherlands2007

[6] A Borji and L Itti ldquoState-of-the-art in visual attention mod-elingrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 35 no 1 pp 185ndash207 2013

[7] L Itti C Koch and E Niebur ldquoAmodel of saliency-based visualattention for rapid scene analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 20 no 11 pp 1254ndash12591998

[8] AM Treisman ldquoFeature integration theoryrdquoCognitive Psychol-ogy vol 12 no 1 pp 97ndash136 1980

[9] J K Tsotsos S M Culhane W Y Kei Wai Y Lai N Davisand F Nuflo ldquoModeling visual attention via selective tuningrdquoArtificial Intelligence vol 78 no 1-2 pp 507ndash545 1995

[10] R Rae Gestikbasierte Mensch-Maschine-Kommunikation aufder grundlage visueller aufmerksamkeit und adaptivitat [PhDthesis] Universitat Bielefeld 2000

[11] D Parkhurst K Law and E Niebur ldquoModeling the role ofsalience in the allocation of overt visual attentionrdquo VisionResearch vol 42 no 1 pp 107ndash123 2002

[12] J Li Y Tian T Huang and W Gao ldquoProbabilistic multi-tasklearning for visual saliency estimation in videordquo InternationalJournal of Computer Vision vol 90 no 2 pp 150ndash165 2010

[13] MCerf J HarelW Einhauser andCKoch ldquoPredicting humangaze using low-level saliency combined with face detectionrdquo inAdvances in Neural Information Processing Systems vol 20 pp241ndash248 2007

[14] A Torralba ldquoModeling global scene factors in attentionrdquoJournal of the Optical Society of America A Optics and ImageScience and Vision vol 20 no 7 pp 1407ndash1418 2003

[15] A Oliva and A Torralba ldquoModeling the shape of the scenea holistic representation of the spatial enveloperdquo InternationalJournal of Computer Vision vol 42 no 3 pp 145ndash175 2001

[16] N D B Bruce and J K Tsotsos ldquoSaliency attention and visualsearch an information theoretic approachrdquo Journal of Visionvol 9 no 3 article 5 2009

[17] E Erdem and A Erdem ldquoVisual saliency estimation by nonlin-early integrating features using region covariancesrdquo Journal ofVision vol 13 no 4 article 11 2013

[18] J H van Hateren ldquoReal and optimal neural images in earlyvisionrdquo Nature vol 360 no 6399 pp 68ndash70 1992

[19] D J Field ldquoRelations between the statistics of natural imagesand the response properties of cortical cellsrdquo Journal of theOptical Society of America A Optics and Image Science vol 4no 12 pp 2379ndash2394 1987

[20] B A Olshausen and D J Field ldquoNatural image statistics andefficient codingrdquo Network Computation in Neural Systems vol7 no 2 pp 333ndash339 1996

[21] M Kwon G Legge F Fang A Cheong and S He ldquoIdentifyingthe mechanism of adaptation to prolonged contrast reductionrdquoJournal of Vision vol 9 no 8 p 976 2009

[22] X Sun H Yao and R Ji ldquoVisual attention modeling based onshort-term environmental adaptionrdquo Journal of Visual Commu-nication and Image Representation vol 24 no 2 pp 171ndash1802013

[23] A Garcia-Diaz X R Fdez-Vidal X M Pardo and R DosilldquoSaliency from hierarchical adaptation through decorrelationand variance normalizationrdquo Image and Vision Computing vol30 no 1 pp 51ndash64 2012

[24] C E Shannon ldquoAmathematical theory of communicationrdquo BellSystem Technical Journal vol 27 no 3 pp 379ndash423 1948

[25] Y Karklin and M S Lewicki ldquoEmergence of complex cellproperties by learning to generalize in natural scenesrdquo Naturevol 457 no 7225 pp 83ndash86 2009

[26] H J Seo and P Milanfar ldquoStatic and space-time visual saliencydetection by self-resemblancerdquo Journal of Vision vol 9 no 12pp 1ndash27 2009

16 The Scientific World Journal

[27] D J Jobson Z-U Rahman and G AWoodell ldquoProperties andperformance of a centersurround retinexrdquo IEEE Transactionson Image Processing vol 6 no 3 pp 451ndash462 1997

[28] A Guo D Zhao S Liu X Fan and W Gao ldquoVisual attentionbased image quality assessmentrdquo in Proceedings of the 18thIEEE International Conference on Image Processing (ICIP rsquo11) pp3297ndash3300 September 2011

[29] A Borji and L Itti ldquoExploiting local and global patch raritiesfor saliency detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo12) pp 478ndash485 2012

[30] L Duan C Wu J Miao L Qing and Y Fu ldquoVisual saliencydetection by spatially weighted dissimilarityrdquo in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR rsquo11) pp 473ndash480 June 2011

[31] JMWolfe ldquoGuided search 20 a revisedmodel of visual searchrdquoPsychonomic Bulletin amp Review vol 1 no 2 pp 202ndash238 1994

[32] O Le Meur P Le Callet D Barba and DThoreau ldquoA coherentcomputational approach to model bottom-up visual attentionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 28 no 5 pp 802ndash817 2006

[33] A Torralba A Oliva M S Castelhano and J M HendersonldquoContextual guidance of eye movements and attention in real-world scenes the role of global features in object searchrdquoPsychological Review vol 113 no 4 pp 766ndash786 2006

[34] G Kootstra N Bergstrom and D Kragic ldquoUsing symmetry toselect fixation points for segmentationrdquo in Proceedings of the20th International Conference on Pattern Recognition (ICPR rsquo10)pp 3894ndash3897 August 2010

[35] C Li J Xue N Zheng X Lan and Z Tian ldquoSpatio-temporalsaliency perception via hypercomplex frequency spectral con-trastrdquo Sensors vol 13 no 3 pp 3409ndash3431 2013

[36] L Itti and P Baldi ldquoBayesian surprise attracts human attentionrdquoVision Research vol 49 no 10 pp 1295ndash1306 2009

[37] J Harel C Koch and P Perona ldquoGraph-based visual saliencyrdquoin Advances in Neural Information Processing Systems vol 19pp 545ndash552 2006

[38] X Hou and L Zhang ldquoSaliency detection a spectral residualapproachrdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo07)pp 1ndash8 June 2007

[39] B Schauerte and R Stiefelhagen ldquoQuaternion-based spectralsaliency detection for eye fixation predictionrdquo in ComputerVisionmdashECCV 2012 A Fitzgibbon S Lazebnik P Perona YSato and C Schmid Eds Lecture Notes in Computer Sciencepp 116ndash129 2012

[40] W Kienzle F Wichmann B Scholkopf and M Franz ANonparametric Approach to Bottom-Up Visual Saliency 2007

[41] J Tilke K Ehinger F Durand and A Torralba ldquoLearningto predict where humans lookrdquo in Proceedings of the 12thInternational Conference on Computer Vision (ICCV rsquo09) pp2106ndash2113 October 2009

[42] W Wang Y Wang Q Huang and W Gao ldquoMeasuring visualsaliency by site entropy raterdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 2368ndash2375 June 2010

[43] O Tuzel F Porikli and P Meer ldquoRegion covariance afast descriptor for detection and classificationrdquo in ComputerVisionmdashECCV 2006 vol 3952 of Lecture Notes in ComputerScience pp 589ndash600 2006

[44] httpssitesgooglecomsitesparsenonlinearsaliencymodelhomedownloads

[45] A Hyvarinen and E Oja ldquoA fast fixed-point algorithm forindependent component analysisrdquo Neural Computation vol 9no 7 pp 1483ndash1492 1997

[46] X Hou and L Zhang ldquoDynamic visual attention searching forcoding length incrementsrdquo in Advances in Neural InformationProcessing Systems vol 21 pp 681ndash688 2008

[47] L Zhang M H Tong T K Marks H Shan and G WCottrell ldquoSUN a Bayesian framework for saliency using naturalstatisticsrdquo Journal of Vision vol 8 no 7 article 32 pp 1ndash202008

[48] J Zhang and S Sclaroff ldquoSaliency detection a boolean mapapproachrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV rsquo13) 2013

[49] X Hou J Harel and C Koch ldquoImage signature highlightingsparse salient regionsrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 34 no 1 pp 194ndash201 2012

[50] E Rahtu J Kannala M Salo and J Heikkila ldquoSegmentingsalient objects from images and videosrdquo in Computer VisionmdashECCV 2010 vol 6315 of Lecture Notes in Computer Science pp366ndash379 2010

[51] M-M Cheng G-X Zhang N J Mitra X Huang and S-M Hu ldquoGlobal contrast based salient region detectionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo11) pp 409ndash416 June 2011

[52] T Veit J-P Tarel P Nicolle and P Charbonnier ldquoEvaluationof road marking feature extractionrdquo in Proceedings of the11th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo08) pp 174ndash181 December 2008

[53] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model of simple receptive fields in catstriate cortexrdquo Journal of Neurophysiology vol 58 no 6 pp1233ndash1258 1987

Page 10: Saliency Detection Using Sparse and Nonlinear …...Saliency Detection Using Sparse and Nonlinear Feature Representation ShahzadAnwar,1 QingjieZhao,1 MuhammadFarhanManzoor,2 andSaqibIshaqKhan3

10 The Scientific World Journal

Table3Com

paris

onof

sAUCscoreo

fthe

prop

osed

mod

elwith

13otherstate-of-the-artmod

els

Dataset

AIM

[16]

GBV

S[37]

MES

R[39]

ICL[46]

Itti[37]

MPQ

FT[39]

SDSR

[26]

SUN[47]

HFT

[2]

LG[29]

ERDM

[17]

ΔQDCT

[39]

AWS[23]

Prop

osed

TORO

NTO

[16]

0699

064

0407141

064

76064

8107119

07062

066

6106915

06939

07023

07172

07130

07179

Optim

al120590

004

001

005

003

001

004

003

003

006

004

004

005

001

004

KOOTS

TRA[34]

05901

05572

05978

05769

05689

05985

05984

05613

05911

05938

0603

05998

06185

06157

Optim

al120590

003

001

004

001

004

005

004

002

005

001

003

003

001

003

IMSA

L[2]

07424

lowast076

6507153

07364

07512

071694

07403

06955

07419lowast07356

07455

lowast07434

07468

07560

Optim

al120590

008

007

015

013

009

014

013

016

009

012

012

010

005

009

Thetop

rank

edmod

elisin

boldfont

andsecond

rank

edmod

elisin

italic

fontH

erer

esultswith

optim

alaverages

AUCof

each

metho

dwith

thec

orrespon

ding

Gaussianblur(120590)are

repo

rtedA

sin[47]w

erepeat

thes

AUCcalculationfor2

0tim

esandcompu

tethes

tand

arddeviationof

each

averages

AUC

which

ranges

from1119864minus4to5119864minus4(lowastvalues

takenfro

m[48])

The Scientific World Journal 11

Table4Com

paris

onof

AUCandPo

DSC

scoreo

fthe

prop

osed

mod

elwith

otherstate-of-the-artmod

els

Algorith

mC1

C2C3

C4C5

C6Av

gAU

CPo

DSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AvgAU

CAIM

0921

0653

0918

0543

0957

0461

0904

0435

0924

0508

0941

0671

0927

MES

R0798

0494

0854

0436

0927

0377

0718

0238

0837

0398

0918

0617

0842

ICL

0928

0681

0907

0501

0909

0348

0926

0537

0902

0498

0914

0569

0914

Itti

0939

0687

0924

0542

0952

0457

0882

0423

0924

0499

0941

0631

0927

MPQ

FT0805

0497

0865

0452

0927

0386

0735

0248

0852

0436

0921

0630

0851

SDSR

0879

0618

0908

0511

0948

0435

0801

0298

0904

0489

0933

064

50896

SUN

0766

0459

0787

0349

0880

0362

0708

0263

0719

0283

0817

0451

0780

HFT

0937

070

00923

0552

0937

044

8096

4064

80933

0554

096

0070

60942

LG0814

0515

0859

046

40904

0379

0635

0145

0862

0499

0884

0572

0826

ERDEM

0882

0617

0912

0555

0962

0526

0840

0349

0923

0575

0916

0635

0906

MQDCT

0840

0564

0894

0517

0947

0489

0806

0311

0888

0524

0923

0670

0883

AWS

0894

0625

0922

0569

0957

0480

0891

0402

0920

0534

0925

0655

0918

Prop

osed

0920

0653

094

3059

60976

0555

0934

0516

094

20532

0956

0688

094

5Th

etop

rank

edmod

elisin

bold

font

andthe2

ndrank

edmod

elisin

italic

fontTh

eoverallbestaverageA

UCscoreisg

iven

byou

rpropo

sedmod

el

12 The Scientific World Journal

Input Proposed AIM GBVS MESR ICL Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 7 Response of our algorithm and other 13 state-of-the-art algorithms on various psychological patterns

extent of the object and thus solve this task like segmentation-type binary labeling formulation [50 51] In contrast ourmodel is designed for location based (eye fixation) saliencymodeling and it is not designed to capture exact objectboundaries however by thresholding a saliency map we canget a binary map that can be used for testing performance ofa model for salient object detection Since our saliencymodelis location based we only use other location based modelsin comparison for a fair evaluation similar convention isfollowed in [2 17]

Dataset and Metric for Evaluation For salient object detec-tion the metric used by Li et al [2] is area under thecurve (AUC) and dice similarity coefficient (DSC) on IMSAL[2] dataset We will use the same metric and dataset forsalient object detectionThe DSC gives the overlap between athreshold saliency map and the ground truth Moreover thepeak value of the DSC [52] is considered an important way toestablish the best performance of an algorithm at an optimalthreshold thus we also give results with peak value of theDSC curve (PoDSC) Since AUC can be influenced by centerbias for fair comparison we turn off the center bias in allthe algorithmsThe GBVS [37] has built-in center bias and inorder to cater for that the author in [2] incorporates explicitcenter bias and shows that HFT [2] performs better thanGBVS on the same dataset which is also used in our paperWe do not employ center bias explicitly or implicitly in thepresented results and addedHFT instead for our comparisonTherefore GVBS [37] is skipped from Table 4 Furthermorewe perform Gaussian smoothing in all the algorithms tofind the optimal smoothing parameters for each class in thedataset and the optimal performance is quoted in the resultsgiven in Table 4

Performance We present results in comparison with other 12state-of-the-art algorithms The complete results are given in

Table 4 Our proposed scheme gives the best performanceon three categories C2 C3 and C5 and ranked secondon C4 and C6 Our model gives the highest average AUCscore on this dataset On different categories our results arecomparative to the HFT which is state of the art on thisdataset Apart fromHFT the performance is also significantlybetter in comparison to other algorithmsThedataset used forcomparison is quite complex but our algorithm performedwell for intermediate small objects with distracters althoughperformance on other cases is little less than other state-of-the-art algorithms

44 Psychological Patterns Wealso tested our saliencymodelon psychological patterns which are commonly used to givea qualitative performance on artificial scenarios which simu-late a pop-up phenomenonThese patterns simulate the pop-out phenomenon based on color intersection symmetryorientation curvature and candles image In order to checkthe general performance on various psychological tasks wetested the proposed model on eight psychological patternsFigure 7 gives the results of the proposed saliency modelalong with other popular models The proposed algorithmworks well on color symmetry and orientation as well as oncandle image but the performance is not good for curvatureand intersection patterns Figure 7 also shows that any singlemodel does not give good performance on all the patterns andthe best performance is more or less the same as given by theproposed scheme

5 Discussion

The image features representation drastically affects theinformation content and thus saliency estimation We useda bioinspired center surround saliency computation on twoparallel feature representations that give good performance

The Scientific World Journal 13

(a)

(b) (c)

(d) (e)

Figure 8 The fixed learned dictionary and the information lost by using such dictionary (a) Fixed learned dictionary ((b)-(c)) The inputimages ((d)-(e)) The information lost by using dictionary given in (a)

on both eye fixation prediction and salient object detectionSince a CSD operation depends on the difference between acenter portion and its surroundings a better image contentsrepresentation makes a more accurate and precise centersurround operation

We used an adaptive sparse representation to boost theperformance of the CSD operation In order to show theeffectiveness of the proposed approach we present bothquantitative and qualitative results For qualitative compar-ison we use a fix dictionary [29] (Figure 8(a)) learnt from

an ensemble of natural images We show that some of theinformation is lost if we use a fixed dictionary to represent animage and usually the lost information belongs to the salientregion in an image The difference between an input imageand the image reconstructed by a fixed dictionary is givenin Figure 8 The red cylinder Figure 8(b) and red box withtext Figure 8(c) is visible in the image plots Figures 8(d)and 8(e) These two objects are the salient features in bothimages and their appearance in the residual image shows thatcurrent representation which uses a fixed dictionary loses

14 The Scientific World Journal

Input Gnd truth Proposed AIM GBVS Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 9 Comparison of our method with other state-of-the-art saliency models The first column is the input image with second columnbeing ground truth followed by our proposed saliency model Results from all the other models are plotted in the same row

Table 5 sAUC score comparison using fixed [29] and adaptivesparse representation (proposed) for CSD saliency

DatasetBorji local [29] Fixeddictionary sparserepresentation

Proposed Adaptivedictionary sparserepresentation

Toronto 0670 07110Kootstra 0572 06061

Table 6 sAUC comparison of E Erdem and A Erdem [17] and theproposed color based nonlinear integration

Dataset E Erdem andA Erdem [17]

Proposed Nonlinearrepresentation

Bruce and Tsotsos [16] 07023 07084Kootstra et al [34] 0603 06152

some information that belong to salient portion of the inputimage

For quantitative comparison we employ sAUC to dothe comparison between saliency maps based on a fixeddictionary and the adaptive basis dictionary Table 5 gives acomparison of the both approaches on two datasets Torontoand Kootstra The performance difference is quite significantand it shows that an adaptive representation is much betterBased on these qualitative and quantitative results we canconclude that an adaptive image presentation is more viableand accurate for CSD based saliency computation

A model proposed in [17] gives an idea of nonlinearintegration of all features by covariance matrices and thesupported implementation uses color gradient and spatialinformation Our second contribution is the modification ofthat model and a proposal of using only color with spatial

information in nonlinearly integrated manner using samecovariance matrices For our proposed implementation(seeFigure 4) we modified the features used in [17] and comparethe saliencymap with Erdemrsquos [17] model in Table 6 using thesAUC score Two databases Toronto [16] and Kootstra [34]are used for simulations and the results indicate that by usingonly color with spatial information we can get better sAUCscore than integrating all features using covariance matricesIn Table 6 the difference in sAUC score is quite visible onboth datasets One possible reason of this improvement maybe that the correlation among different features like color andorientation is different and thus using a covariance basedrepresentation does not capture the underlying informationstructure in an efficient way as compared to when only colorinformation is used

One possible argument against our usage of only colorinformation however can be that without any gradient ororientation information a saliency model will fail to detectmany salient regions This argument can also supported bynature since neurons tuned to orientation in an image areknown to contribute to saliency computation [53] In ourmodel this issue is addressed by the sparse representationwhere the adaptive basis same as basis shown in Figure 8(a)is Gabor like filters with edge like structure and thus thesebases efficiently capture orientation information from theimage which complements our color information in nonlin-ear representation

Finally the sAUC score of dual representation Table 3shows that we achieve better eye fixation prediction thantreating both representations separately as shown in Tables5 and 6We believe that such improvement is due to the com-plementary behavior of both techniques since a combinedapproach better represents image contents with high fidelityand thus that in turn improves saliency detection Lastly for

The Scientific World Journal 15

illustration and visual comparison we present some saliencymaps produced by our algorithm along with other models inFigure 9

6 Conclusion

This paper shows that dual feature representationmanages torobustly capture image information which can be used in acenter surround operation to compute saliencyWe show thata CSD on adaptive sparse basis gives better results than a fixsparse basis representation In nonlinear representation weshow that nonlinearly integrated color channels with spatialinformation better capture underlying data structure andthus a CSD on such representation gives good results Finallywe consider both representations as complementary and thusa fused saliencymap not only give good results on human eyefixations but also detect salient objects with high accuracy

In future wewill incorporate some top-downmechanismto better imitative human saliency computation capabilitiesbased on learning and experience Another possible exten-sion of the existing work is to test dynamic scenes video byincorporating additional motion information in the currentscheme

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the national Natural Sci-ence Foundation of China (61175096) and Specialized Fundfor Joint Building Program of Beijing Municipal EducationCommission

References

[1] C Siagian and L Itti ldquoBiologically inspired mobile robot visionlocalizationrdquo IEEE Transactions on Robotics vol 25 no 4 pp861ndash873 2009

[2] J Li M D Levine X An X Xu and H He ldquoVisual saliencybased on scale-space analysis in the frequency domainrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 4 pp 996ndash1010 2012

[3] Y Su Q Zhao L Zhao and D Gu ldquoAbrupt motion trackingusing a visual saliency embedded particle filterrdquo Pattern Recog-nition vol 47 no 5 pp 1826ndash1834 2014

[4] C Guo Q Ma and L Zhang ldquoSpatio-temporal saliency detec-tion using phase spectrum of quaternion fourier transformrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo08) pp 1ndash8 June 2008

[5] X Hou and L Zhang ldquoThumbnail generation based on globalsaliencyrdquo in Advances in Cognitive NeurodynamicsmdashICCN2007 pp 999ndash1003 Springer Amsterdam The Netherlands2007

[6] A Borji and L Itti ldquoState-of-the-art in visual attention mod-elingrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 35 no 1 pp 185ndash207 2013

[7] L Itti C Koch and E Niebur ldquoAmodel of saliency-based visualattention for rapid scene analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 20 no 11 pp 1254ndash12591998

[8] AM Treisman ldquoFeature integration theoryrdquoCognitive Psychol-ogy vol 12 no 1 pp 97ndash136 1980

[9] J K Tsotsos S M Culhane W Y Kei Wai Y Lai N Davisand F Nuflo ldquoModeling visual attention via selective tuningrdquoArtificial Intelligence vol 78 no 1-2 pp 507ndash545 1995

[10] R Rae Gestikbasierte Mensch-Maschine-Kommunikation aufder grundlage visueller aufmerksamkeit und adaptivitat [PhDthesis] Universitat Bielefeld 2000

[11] D Parkhurst K Law and E Niebur ldquoModeling the role ofsalience in the allocation of overt visual attentionrdquo VisionResearch vol 42 no 1 pp 107ndash123 2002

[12] J Li Y Tian T Huang and W Gao ldquoProbabilistic multi-tasklearning for visual saliency estimation in videordquo InternationalJournal of Computer Vision vol 90 no 2 pp 150ndash165 2010

[13] MCerf J HarelW Einhauser andCKoch ldquoPredicting humangaze using low-level saliency combined with face detectionrdquo inAdvances in Neural Information Processing Systems vol 20 pp241ndash248 2007

[14] A Torralba ldquoModeling global scene factors in attentionrdquoJournal of the Optical Society of America A Optics and ImageScience and Vision vol 20 no 7 pp 1407ndash1418 2003

[15] A Oliva and A Torralba ldquoModeling the shape of the scenea holistic representation of the spatial enveloperdquo InternationalJournal of Computer Vision vol 42 no 3 pp 145ndash175 2001

[16] N D B Bruce and J K Tsotsos ldquoSaliency attention and visualsearch an information theoretic approachrdquo Journal of Visionvol 9 no 3 article 5 2009

[17] E Erdem and A Erdem ldquoVisual saliency estimation by nonlin-early integrating features using region covariancesrdquo Journal ofVision vol 13 no 4 article 11 2013

[18] J H van Hateren ldquoReal and optimal neural images in earlyvisionrdquo Nature vol 360 no 6399 pp 68ndash70 1992

[19] D J Field ldquoRelations between the statistics of natural imagesand the response properties of cortical cellsrdquo Journal of theOptical Society of America A Optics and Image Science vol 4no 12 pp 2379ndash2394 1987

[20] B A Olshausen and D J Field ldquoNatural image statistics andefficient codingrdquo Network Computation in Neural Systems vol7 no 2 pp 333ndash339 1996

[21] M Kwon G Legge F Fang A Cheong and S He ldquoIdentifyingthe mechanism of adaptation to prolonged contrast reductionrdquoJournal of Vision vol 9 no 8 p 976 2009

[22] X Sun H Yao and R Ji ldquoVisual attention modeling based onshort-term environmental adaptionrdquo Journal of Visual Commu-nication and Image Representation vol 24 no 2 pp 171ndash1802013

[23] A Garcia-Diaz X R Fdez-Vidal X M Pardo and R DosilldquoSaliency from hierarchical adaptation through decorrelationand variance normalizationrdquo Image and Vision Computing vol30 no 1 pp 51ndash64 2012

[24] C E Shannon ldquoAmathematical theory of communicationrdquo BellSystem Technical Journal vol 27 no 3 pp 379ndash423 1948

[25] Y Karklin and M S Lewicki ldquoEmergence of complex cellproperties by learning to generalize in natural scenesrdquo Naturevol 457 no 7225 pp 83ndash86 2009

[26] H J Seo and P Milanfar ldquoStatic and space-time visual saliencydetection by self-resemblancerdquo Journal of Vision vol 9 no 12pp 1ndash27 2009

16 The Scientific World Journal

[27] D J Jobson Z-U Rahman and G AWoodell ldquoProperties andperformance of a centersurround retinexrdquo IEEE Transactionson Image Processing vol 6 no 3 pp 451ndash462 1997

[28] A Guo D Zhao S Liu X Fan and W Gao ldquoVisual attentionbased image quality assessmentrdquo in Proceedings of the 18thIEEE International Conference on Image Processing (ICIP rsquo11) pp3297ndash3300 September 2011

[29] A Borji and L Itti ldquoExploiting local and global patch raritiesfor saliency detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo12) pp 478ndash485 2012

[30] L Duan C Wu J Miao L Qing and Y Fu ldquoVisual saliencydetection by spatially weighted dissimilarityrdquo in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR rsquo11) pp 473ndash480 June 2011

[31] JMWolfe ldquoGuided search 20 a revisedmodel of visual searchrdquoPsychonomic Bulletin amp Review vol 1 no 2 pp 202ndash238 1994

[32] O Le Meur P Le Callet D Barba and DThoreau ldquoA coherentcomputational approach to model bottom-up visual attentionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 28 no 5 pp 802ndash817 2006

[33] A Torralba A Oliva M S Castelhano and J M HendersonldquoContextual guidance of eye movements and attention in real-world scenes the role of global features in object searchrdquoPsychological Review vol 113 no 4 pp 766ndash786 2006

[34] G Kootstra N Bergstrom and D Kragic ldquoUsing symmetry toselect fixation points for segmentationrdquo in Proceedings of the20th International Conference on Pattern Recognition (ICPR rsquo10)pp 3894ndash3897 August 2010

[35] C Li J Xue N Zheng X Lan and Z Tian ldquoSpatio-temporalsaliency perception via hypercomplex frequency spectral con-trastrdquo Sensors vol 13 no 3 pp 3409ndash3431 2013

[36] L Itti and P Baldi ldquoBayesian surprise attracts human attentionrdquoVision Research vol 49 no 10 pp 1295ndash1306 2009

[37] J Harel C Koch and P Perona ldquoGraph-based visual saliencyrdquoin Advances in Neural Information Processing Systems vol 19pp 545ndash552 2006

[38] X Hou and L Zhang ldquoSaliency detection a spectral residualapproachrdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo07)pp 1ndash8 June 2007

[39] B Schauerte and R Stiefelhagen ldquoQuaternion-based spectralsaliency detection for eye fixation predictionrdquo in ComputerVisionmdashECCV 2012 A Fitzgibbon S Lazebnik P Perona YSato and C Schmid Eds Lecture Notes in Computer Sciencepp 116ndash129 2012

[40] W Kienzle F Wichmann B Scholkopf and M Franz ANonparametric Approach to Bottom-Up Visual Saliency 2007

[41] J Tilke K Ehinger F Durand and A Torralba ldquoLearningto predict where humans lookrdquo in Proceedings of the 12thInternational Conference on Computer Vision (ICCV rsquo09) pp2106ndash2113 October 2009

[42] W Wang Y Wang Q Huang and W Gao ldquoMeasuring visualsaliency by site entropy raterdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 2368ndash2375 June 2010

[43] O Tuzel F Porikli and P Meer ldquoRegion covariance afast descriptor for detection and classificationrdquo in ComputerVisionmdashECCV 2006 vol 3952 of Lecture Notes in ComputerScience pp 589ndash600 2006

[44] httpssitesgooglecomsitesparsenonlinearsaliencymodelhomedownloads

[45] A Hyvarinen and E Oja ldquoA fast fixed-point algorithm forindependent component analysisrdquo Neural Computation vol 9no 7 pp 1483ndash1492 1997

[46] X Hou and L Zhang ldquoDynamic visual attention searching forcoding length incrementsrdquo in Advances in Neural InformationProcessing Systems vol 21 pp 681ndash688 2008

[47] L Zhang M H Tong T K Marks H Shan and G WCottrell ldquoSUN a Bayesian framework for saliency using naturalstatisticsrdquo Journal of Vision vol 8 no 7 article 32 pp 1ndash202008

[48] J Zhang and S Sclaroff ldquoSaliency detection a boolean mapapproachrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV rsquo13) 2013

[49] X Hou J Harel and C Koch ldquoImage signature highlightingsparse salient regionsrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 34 no 1 pp 194ndash201 2012

[50] E Rahtu J Kannala M Salo and J Heikkila ldquoSegmentingsalient objects from images and videosrdquo in Computer VisionmdashECCV 2010 vol 6315 of Lecture Notes in Computer Science pp366ndash379 2010

[51] M-M Cheng G-X Zhang N J Mitra X Huang and S-M Hu ldquoGlobal contrast based salient region detectionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo11) pp 409ndash416 June 2011

[52] T Veit J-P Tarel P Nicolle and P Charbonnier ldquoEvaluationof road marking feature extractionrdquo in Proceedings of the11th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo08) pp 174ndash181 December 2008

[53] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model of simple receptive fields in catstriate cortexrdquo Journal of Neurophysiology vol 58 no 6 pp1233ndash1258 1987

Page 11: Saliency Detection Using Sparse and Nonlinear …...Saliency Detection Using Sparse and Nonlinear Feature Representation ShahzadAnwar,1 QingjieZhao,1 MuhammadFarhanManzoor,2 andSaqibIshaqKhan3

The Scientific World Journal 11

Table4Com

paris

onof

AUCandPo

DSC

scoreo

fthe

prop

osed

mod

elwith

otherstate-of-the-artmod

els

Algorith

mC1

C2C3

C4C5

C6Av

gAU

CPo

DSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AUC

PoDSC

AvgAU

CAIM

0921

0653

0918

0543

0957

0461

0904

0435

0924

0508

0941

0671

0927

MES

R0798

0494

0854

0436

0927

0377

0718

0238

0837

0398

0918

0617

0842

ICL

0928

0681

0907

0501

0909

0348

0926

0537

0902

0498

0914

0569

0914

Itti

0939

0687

0924

0542

0952

0457

0882

0423

0924

0499

0941

0631

0927

MPQ

FT0805

0497

0865

0452

0927

0386

0735

0248

0852

0436

0921

0630

0851

SDSR

0879

0618

0908

0511

0948

0435

0801

0298

0904

0489

0933

064

50896

SUN

0766

0459

0787

0349

0880

0362

0708

0263

0719

0283

0817

0451

0780

HFT

0937

070

00923

0552

0937

044

8096

4064

80933

0554

096

0070

60942

LG0814

0515

0859

046

40904

0379

0635

0145

0862

0499

0884

0572

0826

ERDEM

0882

0617

0912

0555

0962

0526

0840

0349

0923

0575

0916

0635

0906

MQDCT

0840

0564

0894

0517

0947

0489

0806

0311

0888

0524

0923

0670

0883

AWS

0894

0625

0922

0569

0957

0480

0891

0402

0920

0534

0925

0655

0918

Prop

osed

0920

0653

094

3059

60976

0555

0934

0516

094

20532

0956

0688

094

5Th

etop

rank

edmod

elisin

bold

font

andthe2

ndrank

edmod

elisin

italic

fontTh

eoverallbestaverageA

UCscoreisg

iven

byou

rpropo

sedmod

el

12 The Scientific World Journal

Input Proposed AIM GBVS MESR ICL Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 7 Response of our algorithm and other 13 state-of-the-art algorithms on various psychological patterns

extent of the object and thus solve this task like segmentation-type binary labeling formulation [50 51] In contrast ourmodel is designed for location based (eye fixation) saliencymodeling and it is not designed to capture exact objectboundaries however by thresholding a saliency map we canget a binary map that can be used for testing performance ofa model for salient object detection Since our saliencymodelis location based we only use other location based modelsin comparison for a fair evaluation similar convention isfollowed in [2 17]

Dataset and Metric for Evaluation For salient object detec-tion the metric used by Li et al [2] is area under thecurve (AUC) and dice similarity coefficient (DSC) on IMSAL[2] dataset We will use the same metric and dataset forsalient object detectionThe DSC gives the overlap between athreshold saliency map and the ground truth Moreover thepeak value of the DSC [52] is considered an important way toestablish the best performance of an algorithm at an optimalthreshold thus we also give results with peak value of theDSC curve (PoDSC) Since AUC can be influenced by centerbias for fair comparison we turn off the center bias in allthe algorithmsThe GBVS [37] has built-in center bias and inorder to cater for that the author in [2] incorporates explicitcenter bias and shows that HFT [2] performs better thanGBVS on the same dataset which is also used in our paperWe do not employ center bias explicitly or implicitly in thepresented results and addedHFT instead for our comparisonTherefore GVBS [37] is skipped from Table 4 Furthermorewe perform Gaussian smoothing in all the algorithms tofind the optimal smoothing parameters for each class in thedataset and the optimal performance is quoted in the resultsgiven in Table 4

Performance We present results in comparison with other 12state-of-the-art algorithms The complete results are given in

Table 4 Our proposed scheme gives the best performanceon three categories C2 C3 and C5 and ranked secondon C4 and C6 Our model gives the highest average AUCscore on this dataset On different categories our results arecomparative to the HFT which is state of the art on thisdataset Apart fromHFT the performance is also significantlybetter in comparison to other algorithmsThedataset used forcomparison is quite complex but our algorithm performedwell for intermediate small objects with distracters althoughperformance on other cases is little less than other state-of-the-art algorithms

44 Psychological Patterns Wealso tested our saliencymodelon psychological patterns which are commonly used to givea qualitative performance on artificial scenarios which simu-late a pop-up phenomenonThese patterns simulate the pop-out phenomenon based on color intersection symmetryorientation curvature and candles image In order to checkthe general performance on various psychological tasks wetested the proposed model on eight psychological patternsFigure 7 gives the results of the proposed saliency modelalong with other popular models The proposed algorithmworks well on color symmetry and orientation as well as oncandle image but the performance is not good for curvatureand intersection patterns Figure 7 also shows that any singlemodel does not give good performance on all the patterns andthe best performance is more or less the same as given by theproposed scheme

5 Discussion

The image features representation drastically affects theinformation content and thus saliency estimation We useda bioinspired center surround saliency computation on twoparallel feature representations that give good performance

The Scientific World Journal 13

(a)

(b) (c)

(d) (e)

Figure 8 The fixed learned dictionary and the information lost by using such dictionary (a) Fixed learned dictionary ((b)-(c)) The inputimages ((d)-(e)) The information lost by using dictionary given in (a)

on both eye fixation prediction and salient object detectionSince a CSD operation depends on the difference between acenter portion and its surroundings a better image contentsrepresentation makes a more accurate and precise centersurround operation

We used an adaptive sparse representation to boost theperformance of the CSD operation In order to show theeffectiveness of the proposed approach we present bothquantitative and qualitative results For qualitative compar-ison we use a fix dictionary [29] (Figure 8(a)) learnt from

an ensemble of natural images We show that some of theinformation is lost if we use a fixed dictionary to represent animage and usually the lost information belongs to the salientregion in an image The difference between an input imageand the image reconstructed by a fixed dictionary is givenin Figure 8 The red cylinder Figure 8(b) and red box withtext Figure 8(c) is visible in the image plots Figures 8(d)and 8(e) These two objects are the salient features in bothimages and their appearance in the residual image shows thatcurrent representation which uses a fixed dictionary loses

14 The Scientific World Journal

Input Gnd truth Proposed AIM GBVS Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 9 Comparison of our method with other state-of-the-art saliency models The first column is the input image with second columnbeing ground truth followed by our proposed saliency model Results from all the other models are plotted in the same row

Table 5 sAUC score comparison using fixed [29] and adaptivesparse representation (proposed) for CSD saliency

DatasetBorji local [29] Fixeddictionary sparserepresentation

Proposed Adaptivedictionary sparserepresentation

Toronto 0670 07110Kootstra 0572 06061

Table 6 sAUC comparison of E Erdem and A Erdem [17] and theproposed color based nonlinear integration

Dataset E Erdem andA Erdem [17]

Proposed Nonlinearrepresentation

Bruce and Tsotsos [16] 07023 07084Kootstra et al [34] 0603 06152

some information that belong to salient portion of the inputimage

For quantitative comparison we employ sAUC to dothe comparison between saliency maps based on a fixeddictionary and the adaptive basis dictionary Table 5 gives acomparison of the both approaches on two datasets Torontoand Kootstra The performance difference is quite significantand it shows that an adaptive representation is much betterBased on these qualitative and quantitative results we canconclude that an adaptive image presentation is more viableand accurate for CSD based saliency computation

A model proposed in [17] gives an idea of nonlinearintegration of all features by covariance matrices and thesupported implementation uses color gradient and spatialinformation Our second contribution is the modification ofthat model and a proposal of using only color with spatial

information in nonlinearly integrated manner using samecovariance matrices For our proposed implementation(seeFigure 4) we modified the features used in [17] and comparethe saliencymap with Erdemrsquos [17] model in Table 6 using thesAUC score Two databases Toronto [16] and Kootstra [34]are used for simulations and the results indicate that by usingonly color with spatial information we can get better sAUCscore than integrating all features using covariance matricesIn Table 6 the difference in sAUC score is quite visible onboth datasets One possible reason of this improvement maybe that the correlation among different features like color andorientation is different and thus using a covariance basedrepresentation does not capture the underlying informationstructure in an efficient way as compared to when only colorinformation is used

One possible argument against our usage of only colorinformation however can be that without any gradient ororientation information a saliency model will fail to detectmany salient regions This argument can also supported bynature since neurons tuned to orientation in an image areknown to contribute to saliency computation [53] In ourmodel this issue is addressed by the sparse representationwhere the adaptive basis same as basis shown in Figure 8(a)is Gabor like filters with edge like structure and thus thesebases efficiently capture orientation information from theimage which complements our color information in nonlin-ear representation

Finally the sAUC score of dual representation Table 3shows that we achieve better eye fixation prediction thantreating both representations separately as shown in Tables5 and 6We believe that such improvement is due to the com-plementary behavior of both techniques since a combinedapproach better represents image contents with high fidelityand thus that in turn improves saliency detection Lastly for

The Scientific World Journal 15

illustration and visual comparison we present some saliencymaps produced by our algorithm along with other models inFigure 9

6 Conclusion

This paper shows that dual feature representationmanages torobustly capture image information which can be used in acenter surround operation to compute saliencyWe show thata CSD on adaptive sparse basis gives better results than a fixsparse basis representation In nonlinear representation weshow that nonlinearly integrated color channels with spatialinformation better capture underlying data structure andthus a CSD on such representation gives good results Finallywe consider both representations as complementary and thusa fused saliencymap not only give good results on human eyefixations but also detect salient objects with high accuracy

In future wewill incorporate some top-downmechanismto better imitative human saliency computation capabilitiesbased on learning and experience Another possible exten-sion of the existing work is to test dynamic scenes video byincorporating additional motion information in the currentscheme

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the national Natural Sci-ence Foundation of China (61175096) and Specialized Fundfor Joint Building Program of Beijing Municipal EducationCommission

References

[1] C Siagian and L Itti ldquoBiologically inspired mobile robot visionlocalizationrdquo IEEE Transactions on Robotics vol 25 no 4 pp861ndash873 2009

[2] J Li M D Levine X An X Xu and H He ldquoVisual saliencybased on scale-space analysis in the frequency domainrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 4 pp 996ndash1010 2012

[3] Y Su Q Zhao L Zhao and D Gu ldquoAbrupt motion trackingusing a visual saliency embedded particle filterrdquo Pattern Recog-nition vol 47 no 5 pp 1826ndash1834 2014

[4] C Guo Q Ma and L Zhang ldquoSpatio-temporal saliency detec-tion using phase spectrum of quaternion fourier transformrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo08) pp 1ndash8 June 2008

[5] X Hou and L Zhang ldquoThumbnail generation based on globalsaliencyrdquo in Advances in Cognitive NeurodynamicsmdashICCN2007 pp 999ndash1003 Springer Amsterdam The Netherlands2007

[6] A Borji and L Itti ldquoState-of-the-art in visual attention mod-elingrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 35 no 1 pp 185ndash207 2013

[7] L Itti C Koch and E Niebur ldquoAmodel of saliency-based visualattention for rapid scene analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 20 no 11 pp 1254ndash12591998

[8] AM Treisman ldquoFeature integration theoryrdquoCognitive Psychol-ogy vol 12 no 1 pp 97ndash136 1980

[9] J K Tsotsos S M Culhane W Y Kei Wai Y Lai N Davisand F Nuflo ldquoModeling visual attention via selective tuningrdquoArtificial Intelligence vol 78 no 1-2 pp 507ndash545 1995

[10] R Rae Gestikbasierte Mensch-Maschine-Kommunikation aufder grundlage visueller aufmerksamkeit und adaptivitat [PhDthesis] Universitat Bielefeld 2000

[11] D Parkhurst K Law and E Niebur ldquoModeling the role ofsalience in the allocation of overt visual attentionrdquo VisionResearch vol 42 no 1 pp 107ndash123 2002

[12] J Li Y Tian T Huang and W Gao ldquoProbabilistic multi-tasklearning for visual saliency estimation in videordquo InternationalJournal of Computer Vision vol 90 no 2 pp 150ndash165 2010

[13] MCerf J HarelW Einhauser andCKoch ldquoPredicting humangaze using low-level saliency combined with face detectionrdquo inAdvances in Neural Information Processing Systems vol 20 pp241ndash248 2007

[14] A Torralba ldquoModeling global scene factors in attentionrdquoJournal of the Optical Society of America A Optics and ImageScience and Vision vol 20 no 7 pp 1407ndash1418 2003

[15] A Oliva and A Torralba ldquoModeling the shape of the scenea holistic representation of the spatial enveloperdquo InternationalJournal of Computer Vision vol 42 no 3 pp 145ndash175 2001

[16] N D B Bruce and J K Tsotsos ldquoSaliency attention and visualsearch an information theoretic approachrdquo Journal of Visionvol 9 no 3 article 5 2009

[17] E Erdem and A Erdem ldquoVisual saliency estimation by nonlin-early integrating features using region covariancesrdquo Journal ofVision vol 13 no 4 article 11 2013

[18] J H van Hateren ldquoReal and optimal neural images in earlyvisionrdquo Nature vol 360 no 6399 pp 68ndash70 1992

[19] D J Field ldquoRelations between the statistics of natural imagesand the response properties of cortical cellsrdquo Journal of theOptical Society of America A Optics and Image Science vol 4no 12 pp 2379ndash2394 1987

[20] B A Olshausen and D J Field ldquoNatural image statistics andefficient codingrdquo Network Computation in Neural Systems vol7 no 2 pp 333ndash339 1996

[21] M Kwon G Legge F Fang A Cheong and S He ldquoIdentifyingthe mechanism of adaptation to prolonged contrast reductionrdquoJournal of Vision vol 9 no 8 p 976 2009

[22] X Sun H Yao and R Ji ldquoVisual attention modeling based onshort-term environmental adaptionrdquo Journal of Visual Commu-nication and Image Representation vol 24 no 2 pp 171ndash1802013

[23] A Garcia-Diaz X R Fdez-Vidal X M Pardo and R DosilldquoSaliency from hierarchical adaptation through decorrelationand variance normalizationrdquo Image and Vision Computing vol30 no 1 pp 51ndash64 2012

[24] C E Shannon ldquoAmathematical theory of communicationrdquo BellSystem Technical Journal vol 27 no 3 pp 379ndash423 1948

[25] Y Karklin and M S Lewicki ldquoEmergence of complex cellproperties by learning to generalize in natural scenesrdquo Naturevol 457 no 7225 pp 83ndash86 2009

[26] H J Seo and P Milanfar ldquoStatic and space-time visual saliencydetection by self-resemblancerdquo Journal of Vision vol 9 no 12pp 1ndash27 2009

16 The Scientific World Journal

[27] D J Jobson Z-U Rahman and G AWoodell ldquoProperties andperformance of a centersurround retinexrdquo IEEE Transactionson Image Processing vol 6 no 3 pp 451ndash462 1997

[28] A Guo D Zhao S Liu X Fan and W Gao ldquoVisual attentionbased image quality assessmentrdquo in Proceedings of the 18thIEEE International Conference on Image Processing (ICIP rsquo11) pp3297ndash3300 September 2011

[29] A Borji and L Itti ldquoExploiting local and global patch raritiesfor saliency detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo12) pp 478ndash485 2012

[30] L Duan C Wu J Miao L Qing and Y Fu ldquoVisual saliencydetection by spatially weighted dissimilarityrdquo in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR rsquo11) pp 473ndash480 June 2011

[31] JMWolfe ldquoGuided search 20 a revisedmodel of visual searchrdquoPsychonomic Bulletin amp Review vol 1 no 2 pp 202ndash238 1994

[32] O Le Meur P Le Callet D Barba and DThoreau ldquoA coherentcomputational approach to model bottom-up visual attentionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 28 no 5 pp 802ndash817 2006

[33] A Torralba A Oliva M S Castelhano and J M HendersonldquoContextual guidance of eye movements and attention in real-world scenes the role of global features in object searchrdquoPsychological Review vol 113 no 4 pp 766ndash786 2006

[34] G Kootstra N Bergstrom and D Kragic ldquoUsing symmetry toselect fixation points for segmentationrdquo in Proceedings of the20th International Conference on Pattern Recognition (ICPR rsquo10)pp 3894ndash3897 August 2010

[35] C Li J Xue N Zheng X Lan and Z Tian ldquoSpatio-temporalsaliency perception via hypercomplex frequency spectral con-trastrdquo Sensors vol 13 no 3 pp 3409ndash3431 2013

[36] L Itti and P Baldi ldquoBayesian surprise attracts human attentionrdquoVision Research vol 49 no 10 pp 1295ndash1306 2009

[37] J Harel C Koch and P Perona ldquoGraph-based visual saliencyrdquoin Advances in Neural Information Processing Systems vol 19pp 545ndash552 2006

[38] X Hou and L Zhang ldquoSaliency detection a spectral residualapproachrdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo07)pp 1ndash8 June 2007

[39] B Schauerte and R Stiefelhagen ldquoQuaternion-based spectralsaliency detection for eye fixation predictionrdquo in ComputerVisionmdashECCV 2012 A Fitzgibbon S Lazebnik P Perona YSato and C Schmid Eds Lecture Notes in Computer Sciencepp 116ndash129 2012

[40] W Kienzle F Wichmann B Scholkopf and M Franz ANonparametric Approach to Bottom-Up Visual Saliency 2007

[41] J Tilke K Ehinger F Durand and A Torralba ldquoLearningto predict where humans lookrdquo in Proceedings of the 12thInternational Conference on Computer Vision (ICCV rsquo09) pp2106ndash2113 October 2009

[42] W Wang Y Wang Q Huang and W Gao ldquoMeasuring visualsaliency by site entropy raterdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 2368ndash2375 June 2010

[43] O Tuzel F Porikli and P Meer ldquoRegion covariance afast descriptor for detection and classificationrdquo in ComputerVisionmdashECCV 2006 vol 3952 of Lecture Notes in ComputerScience pp 589ndash600 2006

[44] httpssitesgooglecomsitesparsenonlinearsaliencymodelhomedownloads

[45] A Hyvarinen and E Oja ldquoA fast fixed-point algorithm forindependent component analysisrdquo Neural Computation vol 9no 7 pp 1483ndash1492 1997

[46] X Hou and L Zhang ldquoDynamic visual attention searching forcoding length incrementsrdquo in Advances in Neural InformationProcessing Systems vol 21 pp 681ndash688 2008

[47] L Zhang M H Tong T K Marks H Shan and G WCottrell ldquoSUN a Bayesian framework for saliency using naturalstatisticsrdquo Journal of Vision vol 8 no 7 article 32 pp 1ndash202008

[48] J Zhang and S Sclaroff ldquoSaliency detection a boolean mapapproachrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV rsquo13) 2013

[49] X Hou J Harel and C Koch ldquoImage signature highlightingsparse salient regionsrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 34 no 1 pp 194ndash201 2012

[50] E Rahtu J Kannala M Salo and J Heikkila ldquoSegmentingsalient objects from images and videosrdquo in Computer VisionmdashECCV 2010 vol 6315 of Lecture Notes in Computer Science pp366ndash379 2010

[51] M-M Cheng G-X Zhang N J Mitra X Huang and S-M Hu ldquoGlobal contrast based salient region detectionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo11) pp 409ndash416 June 2011

[52] T Veit J-P Tarel P Nicolle and P Charbonnier ldquoEvaluationof road marking feature extractionrdquo in Proceedings of the11th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo08) pp 174ndash181 December 2008

[53] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model of simple receptive fields in catstriate cortexrdquo Journal of Neurophysiology vol 58 no 6 pp1233ndash1258 1987

Page 12: Saliency Detection Using Sparse and Nonlinear …...Saliency Detection Using Sparse and Nonlinear Feature Representation ShahzadAnwar,1 QingjieZhao,1 MuhammadFarhanManzoor,2 andSaqibIshaqKhan3

12 The Scientific World Journal

Input Proposed AIM GBVS MESR ICL Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 7 Response of our algorithm and other 13 state-of-the-art algorithms on various psychological patterns

extent of the object and thus solve this task like segmentation-type binary labeling formulation [50 51] In contrast ourmodel is designed for location based (eye fixation) saliencymodeling and it is not designed to capture exact objectboundaries however by thresholding a saliency map we canget a binary map that can be used for testing performance ofa model for salient object detection Since our saliencymodelis location based we only use other location based modelsin comparison for a fair evaluation similar convention isfollowed in [2 17]

Dataset and Metric for Evaluation For salient object detec-tion the metric used by Li et al [2] is area under thecurve (AUC) and dice similarity coefficient (DSC) on IMSAL[2] dataset We will use the same metric and dataset forsalient object detectionThe DSC gives the overlap between athreshold saliency map and the ground truth Moreover thepeak value of the DSC [52] is considered an important way toestablish the best performance of an algorithm at an optimalthreshold thus we also give results with peak value of theDSC curve (PoDSC) Since AUC can be influenced by centerbias for fair comparison we turn off the center bias in allthe algorithmsThe GBVS [37] has built-in center bias and inorder to cater for that the author in [2] incorporates explicitcenter bias and shows that HFT [2] performs better thanGBVS on the same dataset which is also used in our paperWe do not employ center bias explicitly or implicitly in thepresented results and addedHFT instead for our comparisonTherefore GVBS [37] is skipped from Table 4 Furthermorewe perform Gaussian smoothing in all the algorithms tofind the optimal smoothing parameters for each class in thedataset and the optimal performance is quoted in the resultsgiven in Table 4

Performance We present results in comparison with other 12state-of-the-art algorithms The complete results are given in

Table 4 Our proposed scheme gives the best performanceon three categories C2 C3 and C5 and ranked secondon C4 and C6 Our model gives the highest average AUCscore on this dataset On different categories our results arecomparative to the HFT which is state of the art on thisdataset Apart fromHFT the performance is also significantlybetter in comparison to other algorithmsThedataset used forcomparison is quite complex but our algorithm performedwell for intermediate small objects with distracters althoughperformance on other cases is little less than other state-of-the-art algorithms

44 Psychological Patterns Wealso tested our saliencymodelon psychological patterns which are commonly used to givea qualitative performance on artificial scenarios which simu-late a pop-up phenomenonThese patterns simulate the pop-out phenomenon based on color intersection symmetryorientation curvature and candles image In order to checkthe general performance on various psychological tasks wetested the proposed model on eight psychological patternsFigure 7 gives the results of the proposed saliency modelalong with other popular models The proposed algorithmworks well on color symmetry and orientation as well as oncandle image but the performance is not good for curvatureand intersection patterns Figure 7 also shows that any singlemodel does not give good performance on all the patterns andthe best performance is more or less the same as given by theproposed scheme

5 Discussion

The image features representation drastically affects theinformation content and thus saliency estimation We useda bioinspired center surround saliency computation on twoparallel feature representations that give good performance

The Scientific World Journal 13

(a)

(b) (c)

(d) (e)

Figure 8 The fixed learned dictionary and the information lost by using such dictionary (a) Fixed learned dictionary ((b)-(c)) The inputimages ((d)-(e)) The information lost by using dictionary given in (a)

on both eye fixation prediction and salient object detectionSince a CSD operation depends on the difference between acenter portion and its surroundings a better image contentsrepresentation makes a more accurate and precise centersurround operation

We used an adaptive sparse representation to boost theperformance of the CSD operation In order to show theeffectiveness of the proposed approach we present bothquantitative and qualitative results For qualitative compar-ison we use a fix dictionary [29] (Figure 8(a)) learnt from

an ensemble of natural images We show that some of theinformation is lost if we use a fixed dictionary to represent animage and usually the lost information belongs to the salientregion in an image The difference between an input imageand the image reconstructed by a fixed dictionary is givenin Figure 8 The red cylinder Figure 8(b) and red box withtext Figure 8(c) is visible in the image plots Figures 8(d)and 8(e) These two objects are the salient features in bothimages and their appearance in the residual image shows thatcurrent representation which uses a fixed dictionary loses

14 The Scientific World Journal

Input Gnd truth Proposed AIM GBVS Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 9 Comparison of our method with other state-of-the-art saliency models The first column is the input image with second columnbeing ground truth followed by our proposed saliency model Results from all the other models are plotted in the same row

Table 5 sAUC score comparison using fixed [29] and adaptivesparse representation (proposed) for CSD saliency

DatasetBorji local [29] Fixeddictionary sparserepresentation

Proposed Adaptivedictionary sparserepresentation

Toronto 0670 07110Kootstra 0572 06061

Table 6 sAUC comparison of E Erdem and A Erdem [17] and theproposed color based nonlinear integration

Dataset E Erdem andA Erdem [17]

Proposed Nonlinearrepresentation

Bruce and Tsotsos [16] 07023 07084Kootstra et al [34] 0603 06152

some information that belong to salient portion of the inputimage

For quantitative comparison we employ sAUC to dothe comparison between saliency maps based on a fixeddictionary and the adaptive basis dictionary Table 5 gives acomparison of the both approaches on two datasets Torontoand Kootstra The performance difference is quite significantand it shows that an adaptive representation is much betterBased on these qualitative and quantitative results we canconclude that an adaptive image presentation is more viableand accurate for CSD based saliency computation

A model proposed in [17] gives an idea of nonlinearintegration of all features by covariance matrices and thesupported implementation uses color gradient and spatialinformation Our second contribution is the modification ofthat model and a proposal of using only color with spatial

information in nonlinearly integrated manner using samecovariance matrices For our proposed implementation(seeFigure 4) we modified the features used in [17] and comparethe saliencymap with Erdemrsquos [17] model in Table 6 using thesAUC score Two databases Toronto [16] and Kootstra [34]are used for simulations and the results indicate that by usingonly color with spatial information we can get better sAUCscore than integrating all features using covariance matricesIn Table 6 the difference in sAUC score is quite visible onboth datasets One possible reason of this improvement maybe that the correlation among different features like color andorientation is different and thus using a covariance basedrepresentation does not capture the underlying informationstructure in an efficient way as compared to when only colorinformation is used

One possible argument against our usage of only colorinformation however can be that without any gradient ororientation information a saliency model will fail to detectmany salient regions This argument can also supported bynature since neurons tuned to orientation in an image areknown to contribute to saliency computation [53] In ourmodel this issue is addressed by the sparse representationwhere the adaptive basis same as basis shown in Figure 8(a)is Gabor like filters with edge like structure and thus thesebases efficiently capture orientation information from theimage which complements our color information in nonlin-ear representation

Finally the sAUC score of dual representation Table 3shows that we achieve better eye fixation prediction thantreating both representations separately as shown in Tables5 and 6We believe that such improvement is due to the com-plementary behavior of both techniques since a combinedapproach better represents image contents with high fidelityand thus that in turn improves saliency detection Lastly for

The Scientific World Journal 15

illustration and visual comparison we present some saliencymaps produced by our algorithm along with other models inFigure 9

6 Conclusion

This paper shows that dual feature representationmanages torobustly capture image information which can be used in acenter surround operation to compute saliencyWe show thata CSD on adaptive sparse basis gives better results than a fixsparse basis representation In nonlinear representation weshow that nonlinearly integrated color channels with spatialinformation better capture underlying data structure andthus a CSD on such representation gives good results Finallywe consider both representations as complementary and thusa fused saliencymap not only give good results on human eyefixations but also detect salient objects with high accuracy

In future wewill incorporate some top-downmechanismto better imitative human saliency computation capabilitiesbased on learning and experience Another possible exten-sion of the existing work is to test dynamic scenes video byincorporating additional motion information in the currentscheme

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the national Natural Sci-ence Foundation of China (61175096) and Specialized Fundfor Joint Building Program of Beijing Municipal EducationCommission

References

[1] C Siagian and L Itti ldquoBiologically inspired mobile robot visionlocalizationrdquo IEEE Transactions on Robotics vol 25 no 4 pp861ndash873 2009

[2] J Li M D Levine X An X Xu and H He ldquoVisual saliencybased on scale-space analysis in the frequency domainrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 4 pp 996ndash1010 2012

[3] Y Su Q Zhao L Zhao and D Gu ldquoAbrupt motion trackingusing a visual saliency embedded particle filterrdquo Pattern Recog-nition vol 47 no 5 pp 1826ndash1834 2014

[4] C Guo Q Ma and L Zhang ldquoSpatio-temporal saliency detec-tion using phase spectrum of quaternion fourier transformrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo08) pp 1ndash8 June 2008

[5] X Hou and L Zhang ldquoThumbnail generation based on globalsaliencyrdquo in Advances in Cognitive NeurodynamicsmdashICCN2007 pp 999ndash1003 Springer Amsterdam The Netherlands2007

[6] A Borji and L Itti ldquoState-of-the-art in visual attention mod-elingrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 35 no 1 pp 185ndash207 2013

[7] L Itti C Koch and E Niebur ldquoAmodel of saliency-based visualattention for rapid scene analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 20 no 11 pp 1254ndash12591998

[8] AM Treisman ldquoFeature integration theoryrdquoCognitive Psychol-ogy vol 12 no 1 pp 97ndash136 1980

[9] J K Tsotsos S M Culhane W Y Kei Wai Y Lai N Davisand F Nuflo ldquoModeling visual attention via selective tuningrdquoArtificial Intelligence vol 78 no 1-2 pp 507ndash545 1995

[10] R Rae Gestikbasierte Mensch-Maschine-Kommunikation aufder grundlage visueller aufmerksamkeit und adaptivitat [PhDthesis] Universitat Bielefeld 2000

[11] D Parkhurst K Law and E Niebur ldquoModeling the role ofsalience in the allocation of overt visual attentionrdquo VisionResearch vol 42 no 1 pp 107ndash123 2002

[12] J Li Y Tian T Huang and W Gao ldquoProbabilistic multi-tasklearning for visual saliency estimation in videordquo InternationalJournal of Computer Vision vol 90 no 2 pp 150ndash165 2010

[13] MCerf J HarelW Einhauser andCKoch ldquoPredicting humangaze using low-level saliency combined with face detectionrdquo inAdvances in Neural Information Processing Systems vol 20 pp241ndash248 2007

[14] A Torralba ldquoModeling global scene factors in attentionrdquoJournal of the Optical Society of America A Optics and ImageScience and Vision vol 20 no 7 pp 1407ndash1418 2003

[15] A Oliva and A Torralba ldquoModeling the shape of the scenea holistic representation of the spatial enveloperdquo InternationalJournal of Computer Vision vol 42 no 3 pp 145ndash175 2001

[16] N D B Bruce and J K Tsotsos ldquoSaliency attention and visualsearch an information theoretic approachrdquo Journal of Visionvol 9 no 3 article 5 2009

[17] E Erdem and A Erdem ldquoVisual saliency estimation by nonlin-early integrating features using region covariancesrdquo Journal ofVision vol 13 no 4 article 11 2013

[18] J H van Hateren ldquoReal and optimal neural images in earlyvisionrdquo Nature vol 360 no 6399 pp 68ndash70 1992

[19] D J Field ldquoRelations between the statistics of natural imagesand the response properties of cortical cellsrdquo Journal of theOptical Society of America A Optics and Image Science vol 4no 12 pp 2379ndash2394 1987

[20] B A Olshausen and D J Field ldquoNatural image statistics andefficient codingrdquo Network Computation in Neural Systems vol7 no 2 pp 333ndash339 1996

[21] M Kwon G Legge F Fang A Cheong and S He ldquoIdentifyingthe mechanism of adaptation to prolonged contrast reductionrdquoJournal of Vision vol 9 no 8 p 976 2009

[22] X Sun H Yao and R Ji ldquoVisual attention modeling based onshort-term environmental adaptionrdquo Journal of Visual Commu-nication and Image Representation vol 24 no 2 pp 171ndash1802013

[23] A Garcia-Diaz X R Fdez-Vidal X M Pardo and R DosilldquoSaliency from hierarchical adaptation through decorrelationand variance normalizationrdquo Image and Vision Computing vol30 no 1 pp 51ndash64 2012

[24] C E Shannon ldquoAmathematical theory of communicationrdquo BellSystem Technical Journal vol 27 no 3 pp 379ndash423 1948

[25] Y Karklin and M S Lewicki ldquoEmergence of complex cellproperties by learning to generalize in natural scenesrdquo Naturevol 457 no 7225 pp 83ndash86 2009

[26] H J Seo and P Milanfar ldquoStatic and space-time visual saliencydetection by self-resemblancerdquo Journal of Vision vol 9 no 12pp 1ndash27 2009

16 The Scientific World Journal

[27] D J Jobson Z-U Rahman and G AWoodell ldquoProperties andperformance of a centersurround retinexrdquo IEEE Transactionson Image Processing vol 6 no 3 pp 451ndash462 1997

[28] A Guo D Zhao S Liu X Fan and W Gao ldquoVisual attentionbased image quality assessmentrdquo in Proceedings of the 18thIEEE International Conference on Image Processing (ICIP rsquo11) pp3297ndash3300 September 2011

[29] A Borji and L Itti ldquoExploiting local and global patch raritiesfor saliency detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo12) pp 478ndash485 2012

[30] L Duan C Wu J Miao L Qing and Y Fu ldquoVisual saliencydetection by spatially weighted dissimilarityrdquo in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR rsquo11) pp 473ndash480 June 2011

[31] JMWolfe ldquoGuided search 20 a revisedmodel of visual searchrdquoPsychonomic Bulletin amp Review vol 1 no 2 pp 202ndash238 1994

[32] O Le Meur P Le Callet D Barba and DThoreau ldquoA coherentcomputational approach to model bottom-up visual attentionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 28 no 5 pp 802ndash817 2006

[33] A Torralba A Oliva M S Castelhano and J M HendersonldquoContextual guidance of eye movements and attention in real-world scenes the role of global features in object searchrdquoPsychological Review vol 113 no 4 pp 766ndash786 2006

[34] G Kootstra N Bergstrom and D Kragic ldquoUsing symmetry toselect fixation points for segmentationrdquo in Proceedings of the20th International Conference on Pattern Recognition (ICPR rsquo10)pp 3894ndash3897 August 2010

[35] C Li J Xue N Zheng X Lan and Z Tian ldquoSpatio-temporalsaliency perception via hypercomplex frequency spectral con-trastrdquo Sensors vol 13 no 3 pp 3409ndash3431 2013

[36] L Itti and P Baldi ldquoBayesian surprise attracts human attentionrdquoVision Research vol 49 no 10 pp 1295ndash1306 2009

[37] J Harel C Koch and P Perona ldquoGraph-based visual saliencyrdquoin Advances in Neural Information Processing Systems vol 19pp 545ndash552 2006

[38] X Hou and L Zhang ldquoSaliency detection a spectral residualapproachrdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo07)pp 1ndash8 June 2007

[39] B Schauerte and R Stiefelhagen ldquoQuaternion-based spectralsaliency detection for eye fixation predictionrdquo in ComputerVisionmdashECCV 2012 A Fitzgibbon S Lazebnik P Perona YSato and C Schmid Eds Lecture Notes in Computer Sciencepp 116ndash129 2012

[40] W Kienzle F Wichmann B Scholkopf and M Franz ANonparametric Approach to Bottom-Up Visual Saliency 2007

[41] J Tilke K Ehinger F Durand and A Torralba ldquoLearningto predict where humans lookrdquo in Proceedings of the 12thInternational Conference on Computer Vision (ICCV rsquo09) pp2106ndash2113 October 2009

[42] W Wang Y Wang Q Huang and W Gao ldquoMeasuring visualsaliency by site entropy raterdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 2368ndash2375 June 2010

[43] O Tuzel F Porikli and P Meer ldquoRegion covariance afast descriptor for detection and classificationrdquo in ComputerVisionmdashECCV 2006 vol 3952 of Lecture Notes in ComputerScience pp 589ndash600 2006

[44] httpssitesgooglecomsitesparsenonlinearsaliencymodelhomedownloads

[45] A Hyvarinen and E Oja ldquoA fast fixed-point algorithm forindependent component analysisrdquo Neural Computation vol 9no 7 pp 1483ndash1492 1997

[46] X Hou and L Zhang ldquoDynamic visual attention searching forcoding length incrementsrdquo in Advances in Neural InformationProcessing Systems vol 21 pp 681ndash688 2008

[47] L Zhang M H Tong T K Marks H Shan and G WCottrell ldquoSUN a Bayesian framework for saliency using naturalstatisticsrdquo Journal of Vision vol 8 no 7 article 32 pp 1ndash202008

[48] J Zhang and S Sclaroff ldquoSaliency detection a boolean mapapproachrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV rsquo13) 2013

[49] X Hou J Harel and C Koch ldquoImage signature highlightingsparse salient regionsrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 34 no 1 pp 194ndash201 2012

[50] E Rahtu J Kannala M Salo and J Heikkila ldquoSegmentingsalient objects from images and videosrdquo in Computer VisionmdashECCV 2010 vol 6315 of Lecture Notes in Computer Science pp366ndash379 2010

[51] M-M Cheng G-X Zhang N J Mitra X Huang and S-M Hu ldquoGlobal contrast based salient region detectionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo11) pp 409ndash416 June 2011

[52] T Veit J-P Tarel P Nicolle and P Charbonnier ldquoEvaluationof road marking feature extractionrdquo in Proceedings of the11th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo08) pp 174ndash181 December 2008

[53] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model of simple receptive fields in catstriate cortexrdquo Journal of Neurophysiology vol 58 no 6 pp1233ndash1258 1987

Page 13: Saliency Detection Using Sparse and Nonlinear …...Saliency Detection Using Sparse and Nonlinear Feature Representation ShahzadAnwar,1 QingjieZhao,1 MuhammadFarhanManzoor,2 andSaqibIshaqKhan3

The Scientific World Journal 13

(a)

(b) (c)

(d) (e)

Figure 8 The fixed learned dictionary and the information lost by using such dictionary (a) Fixed learned dictionary ((b)-(c)) The inputimages ((d)-(e)) The information lost by using dictionary given in (a)

on both eye fixation prediction and salient object detectionSince a CSD operation depends on the difference between acenter portion and its surroundings a better image contentsrepresentation makes a more accurate and precise centersurround operation

We used an adaptive sparse representation to boost theperformance of the CSD operation In order to show theeffectiveness of the proposed approach we present bothquantitative and qualitative results For qualitative compar-ison we use a fix dictionary [29] (Figure 8(a)) learnt from

an ensemble of natural images We show that some of theinformation is lost if we use a fixed dictionary to represent animage and usually the lost information belongs to the salientregion in an image The difference between an input imageand the image reconstructed by a fixed dictionary is givenin Figure 8 The red cylinder Figure 8(b) and red box withtext Figure 8(c) is visible in the image plots Figures 8(d)and 8(e) These two objects are the salient features in bothimages and their appearance in the residual image shows thatcurrent representation which uses a fixed dictionary loses

14 The Scientific World Journal

Input Gnd truth Proposed AIM GBVS Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 9 Comparison of our method with other state-of-the-art saliency models The first column is the input image with second columnbeing ground truth followed by our proposed saliency model Results from all the other models are plotted in the same row

Table 5 sAUC score comparison using fixed [29] and adaptivesparse representation (proposed) for CSD saliency

DatasetBorji local [29] Fixeddictionary sparserepresentation

Proposed Adaptivedictionary sparserepresentation

Toronto 0670 07110Kootstra 0572 06061

Table 6 sAUC comparison of E Erdem and A Erdem [17] and theproposed color based nonlinear integration

Dataset E Erdem andA Erdem [17]

Proposed Nonlinearrepresentation

Bruce and Tsotsos [16] 07023 07084Kootstra et al [34] 0603 06152

some information that belong to salient portion of the inputimage

For quantitative comparison we employ sAUC to dothe comparison between saliency maps based on a fixeddictionary and the adaptive basis dictionary Table 5 gives acomparison of the both approaches on two datasets Torontoand Kootstra The performance difference is quite significantand it shows that an adaptive representation is much betterBased on these qualitative and quantitative results we canconclude that an adaptive image presentation is more viableand accurate for CSD based saliency computation

A model proposed in [17] gives an idea of nonlinearintegration of all features by covariance matrices and thesupported implementation uses color gradient and spatialinformation Our second contribution is the modification ofthat model and a proposal of using only color with spatial

information in nonlinearly integrated manner using samecovariance matrices For our proposed implementation(seeFigure 4) we modified the features used in [17] and comparethe saliencymap with Erdemrsquos [17] model in Table 6 using thesAUC score Two databases Toronto [16] and Kootstra [34]are used for simulations and the results indicate that by usingonly color with spatial information we can get better sAUCscore than integrating all features using covariance matricesIn Table 6 the difference in sAUC score is quite visible onboth datasets One possible reason of this improvement maybe that the correlation among different features like color andorientation is different and thus using a covariance basedrepresentation does not capture the underlying informationstructure in an efficient way as compared to when only colorinformation is used

One possible argument against our usage of only colorinformation however can be that without any gradient ororientation information a saliency model will fail to detectmany salient regions This argument can also supported bynature since neurons tuned to orientation in an image areknown to contribute to saliency computation [53] In ourmodel this issue is addressed by the sparse representationwhere the adaptive basis same as basis shown in Figure 8(a)is Gabor like filters with edge like structure and thus thesebases efficiently capture orientation information from theimage which complements our color information in nonlin-ear representation

Finally the sAUC score of dual representation Table 3shows that we achieve better eye fixation prediction thantreating both representations separately as shown in Tables5 and 6We believe that such improvement is due to the com-plementary behavior of both techniques since a combinedapproach better represents image contents with high fidelityand thus that in turn improves saliency detection Lastly for

The Scientific World Journal 15

illustration and visual comparison we present some saliencymaps produced by our algorithm along with other models inFigure 9

6 Conclusion

This paper shows that dual feature representationmanages torobustly capture image information which can be used in acenter surround operation to compute saliencyWe show thata CSD on adaptive sparse basis gives better results than a fixsparse basis representation In nonlinear representation weshow that nonlinearly integrated color channels with spatialinformation better capture underlying data structure andthus a CSD on such representation gives good results Finallywe consider both representations as complementary and thusa fused saliencymap not only give good results on human eyefixations but also detect salient objects with high accuracy

In future wewill incorporate some top-downmechanismto better imitative human saliency computation capabilitiesbased on learning and experience Another possible exten-sion of the existing work is to test dynamic scenes video byincorporating additional motion information in the currentscheme

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the national Natural Sci-ence Foundation of China (61175096) and Specialized Fundfor Joint Building Program of Beijing Municipal EducationCommission

References

[1] C Siagian and L Itti ldquoBiologically inspired mobile robot visionlocalizationrdquo IEEE Transactions on Robotics vol 25 no 4 pp861ndash873 2009

[2] J Li M D Levine X An X Xu and H He ldquoVisual saliencybased on scale-space analysis in the frequency domainrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 4 pp 996ndash1010 2012

[3] Y Su Q Zhao L Zhao and D Gu ldquoAbrupt motion trackingusing a visual saliency embedded particle filterrdquo Pattern Recog-nition vol 47 no 5 pp 1826ndash1834 2014

[4] C Guo Q Ma and L Zhang ldquoSpatio-temporal saliency detec-tion using phase spectrum of quaternion fourier transformrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo08) pp 1ndash8 June 2008

[5] X Hou and L Zhang ldquoThumbnail generation based on globalsaliencyrdquo in Advances in Cognitive NeurodynamicsmdashICCN2007 pp 999ndash1003 Springer Amsterdam The Netherlands2007

[6] A Borji and L Itti ldquoState-of-the-art in visual attention mod-elingrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 35 no 1 pp 185ndash207 2013

[7] L Itti C Koch and E Niebur ldquoAmodel of saliency-based visualattention for rapid scene analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 20 no 11 pp 1254ndash12591998

[8] AM Treisman ldquoFeature integration theoryrdquoCognitive Psychol-ogy vol 12 no 1 pp 97ndash136 1980

[9] J K Tsotsos S M Culhane W Y Kei Wai Y Lai N Davisand F Nuflo ldquoModeling visual attention via selective tuningrdquoArtificial Intelligence vol 78 no 1-2 pp 507ndash545 1995

[10] R Rae Gestikbasierte Mensch-Maschine-Kommunikation aufder grundlage visueller aufmerksamkeit und adaptivitat [PhDthesis] Universitat Bielefeld 2000

[11] D Parkhurst K Law and E Niebur ldquoModeling the role ofsalience in the allocation of overt visual attentionrdquo VisionResearch vol 42 no 1 pp 107ndash123 2002

[12] J Li Y Tian T Huang and W Gao ldquoProbabilistic multi-tasklearning for visual saliency estimation in videordquo InternationalJournal of Computer Vision vol 90 no 2 pp 150ndash165 2010

[13] MCerf J HarelW Einhauser andCKoch ldquoPredicting humangaze using low-level saliency combined with face detectionrdquo inAdvances in Neural Information Processing Systems vol 20 pp241ndash248 2007

[14] A Torralba ldquoModeling global scene factors in attentionrdquoJournal of the Optical Society of America A Optics and ImageScience and Vision vol 20 no 7 pp 1407ndash1418 2003

[15] A Oliva and A Torralba ldquoModeling the shape of the scenea holistic representation of the spatial enveloperdquo InternationalJournal of Computer Vision vol 42 no 3 pp 145ndash175 2001

[16] N D B Bruce and J K Tsotsos ldquoSaliency attention and visualsearch an information theoretic approachrdquo Journal of Visionvol 9 no 3 article 5 2009

[17] E Erdem and A Erdem ldquoVisual saliency estimation by nonlin-early integrating features using region covariancesrdquo Journal ofVision vol 13 no 4 article 11 2013

[18] J H van Hateren ldquoReal and optimal neural images in earlyvisionrdquo Nature vol 360 no 6399 pp 68ndash70 1992

[19] D J Field ldquoRelations between the statistics of natural imagesand the response properties of cortical cellsrdquo Journal of theOptical Society of America A Optics and Image Science vol 4no 12 pp 2379ndash2394 1987

[20] B A Olshausen and D J Field ldquoNatural image statistics andefficient codingrdquo Network Computation in Neural Systems vol7 no 2 pp 333ndash339 1996

[21] M Kwon G Legge F Fang A Cheong and S He ldquoIdentifyingthe mechanism of adaptation to prolonged contrast reductionrdquoJournal of Vision vol 9 no 8 p 976 2009

[22] X Sun H Yao and R Ji ldquoVisual attention modeling based onshort-term environmental adaptionrdquo Journal of Visual Commu-nication and Image Representation vol 24 no 2 pp 171ndash1802013

[23] A Garcia-Diaz X R Fdez-Vidal X M Pardo and R DosilldquoSaliency from hierarchical adaptation through decorrelationand variance normalizationrdquo Image and Vision Computing vol30 no 1 pp 51ndash64 2012

[24] C E Shannon ldquoAmathematical theory of communicationrdquo BellSystem Technical Journal vol 27 no 3 pp 379ndash423 1948

[25] Y Karklin and M S Lewicki ldquoEmergence of complex cellproperties by learning to generalize in natural scenesrdquo Naturevol 457 no 7225 pp 83ndash86 2009

[26] H J Seo and P Milanfar ldquoStatic and space-time visual saliencydetection by self-resemblancerdquo Journal of Vision vol 9 no 12pp 1ndash27 2009

16 The Scientific World Journal

[27] D J Jobson Z-U Rahman and G AWoodell ldquoProperties andperformance of a centersurround retinexrdquo IEEE Transactionson Image Processing vol 6 no 3 pp 451ndash462 1997

[28] A Guo D Zhao S Liu X Fan and W Gao ldquoVisual attentionbased image quality assessmentrdquo in Proceedings of the 18thIEEE International Conference on Image Processing (ICIP rsquo11) pp3297ndash3300 September 2011

[29] A Borji and L Itti ldquoExploiting local and global patch raritiesfor saliency detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo12) pp 478ndash485 2012

[30] L Duan C Wu J Miao L Qing and Y Fu ldquoVisual saliencydetection by spatially weighted dissimilarityrdquo in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR rsquo11) pp 473ndash480 June 2011

[31] JMWolfe ldquoGuided search 20 a revisedmodel of visual searchrdquoPsychonomic Bulletin amp Review vol 1 no 2 pp 202ndash238 1994

[32] O Le Meur P Le Callet D Barba and DThoreau ldquoA coherentcomputational approach to model bottom-up visual attentionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 28 no 5 pp 802ndash817 2006

[33] A Torralba A Oliva M S Castelhano and J M HendersonldquoContextual guidance of eye movements and attention in real-world scenes the role of global features in object searchrdquoPsychological Review vol 113 no 4 pp 766ndash786 2006

[34] G Kootstra N Bergstrom and D Kragic ldquoUsing symmetry toselect fixation points for segmentationrdquo in Proceedings of the20th International Conference on Pattern Recognition (ICPR rsquo10)pp 3894ndash3897 August 2010

[35] C Li J Xue N Zheng X Lan and Z Tian ldquoSpatio-temporalsaliency perception via hypercomplex frequency spectral con-trastrdquo Sensors vol 13 no 3 pp 3409ndash3431 2013

[36] L Itti and P Baldi ldquoBayesian surprise attracts human attentionrdquoVision Research vol 49 no 10 pp 1295ndash1306 2009

[37] J Harel C Koch and P Perona ldquoGraph-based visual saliencyrdquoin Advances in Neural Information Processing Systems vol 19pp 545ndash552 2006

[38] X Hou and L Zhang ldquoSaliency detection a spectral residualapproachrdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo07)pp 1ndash8 June 2007

[39] B Schauerte and R Stiefelhagen ldquoQuaternion-based spectralsaliency detection for eye fixation predictionrdquo in ComputerVisionmdashECCV 2012 A Fitzgibbon S Lazebnik P Perona YSato and C Schmid Eds Lecture Notes in Computer Sciencepp 116ndash129 2012

[40] W Kienzle F Wichmann B Scholkopf and M Franz ANonparametric Approach to Bottom-Up Visual Saliency 2007

[41] J Tilke K Ehinger F Durand and A Torralba ldquoLearningto predict where humans lookrdquo in Proceedings of the 12thInternational Conference on Computer Vision (ICCV rsquo09) pp2106ndash2113 October 2009

[42] W Wang Y Wang Q Huang and W Gao ldquoMeasuring visualsaliency by site entropy raterdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 2368ndash2375 June 2010

[43] O Tuzel F Porikli and P Meer ldquoRegion covariance afast descriptor for detection and classificationrdquo in ComputerVisionmdashECCV 2006 vol 3952 of Lecture Notes in ComputerScience pp 589ndash600 2006

[44] httpssitesgooglecomsitesparsenonlinearsaliencymodelhomedownloads

[45] A Hyvarinen and E Oja ldquoA fast fixed-point algorithm forindependent component analysisrdquo Neural Computation vol 9no 7 pp 1483ndash1492 1997

[46] X Hou and L Zhang ldquoDynamic visual attention searching forcoding length incrementsrdquo in Advances in Neural InformationProcessing Systems vol 21 pp 681ndash688 2008

[47] L Zhang M H Tong T K Marks H Shan and G WCottrell ldquoSUN a Bayesian framework for saliency using naturalstatisticsrdquo Journal of Vision vol 8 no 7 article 32 pp 1ndash202008

[48] J Zhang and S Sclaroff ldquoSaliency detection a boolean mapapproachrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV rsquo13) 2013

[49] X Hou J Harel and C Koch ldquoImage signature highlightingsparse salient regionsrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 34 no 1 pp 194ndash201 2012

[50] E Rahtu J Kannala M Salo and J Heikkila ldquoSegmentingsalient objects from images and videosrdquo in Computer VisionmdashECCV 2010 vol 6315 of Lecture Notes in Computer Science pp366ndash379 2010

[51] M-M Cheng G-X Zhang N J Mitra X Huang and S-M Hu ldquoGlobal contrast based salient region detectionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo11) pp 409ndash416 June 2011

[52] T Veit J-P Tarel P Nicolle and P Charbonnier ldquoEvaluationof road marking feature extractionrdquo in Proceedings of the11th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo08) pp 174ndash181 December 2008

[53] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model of simple receptive fields in catstriate cortexrdquo Journal of Neurophysiology vol 58 no 6 pp1233ndash1258 1987

Page 14: Saliency Detection Using Sparse and Nonlinear …...Saliency Detection Using Sparse and Nonlinear Feature Representation ShahzadAnwar,1 QingjieZhao,1 MuhammadFarhanManzoor,2 andSaqibIshaqKhan3

14 The Scientific World Journal

Input Gnd truth Proposed AIM GBVS Itti MPQFT SDSR SUN HFT LG ERDEM MQDCT AWS

Figure 9 Comparison of our method with other state-of-the-art saliency models The first column is the input image with second columnbeing ground truth followed by our proposed saliency model Results from all the other models are plotted in the same row

Table 5 sAUC score comparison using fixed [29] and adaptivesparse representation (proposed) for CSD saliency

DatasetBorji local [29] Fixeddictionary sparserepresentation

Proposed Adaptivedictionary sparserepresentation

Toronto 0670 07110Kootstra 0572 06061

Table 6 sAUC comparison of E Erdem and A Erdem [17] and theproposed color based nonlinear integration

Dataset E Erdem andA Erdem [17]

Proposed Nonlinearrepresentation

Bruce and Tsotsos [16] 07023 07084Kootstra et al [34] 0603 06152

some information that belong to salient portion of the inputimage

For quantitative comparison we employ sAUC to dothe comparison between saliency maps based on a fixeddictionary and the adaptive basis dictionary Table 5 gives acomparison of the both approaches on two datasets Torontoand Kootstra The performance difference is quite significantand it shows that an adaptive representation is much betterBased on these qualitative and quantitative results we canconclude that an adaptive image presentation is more viableand accurate for CSD based saliency computation

A model proposed in [17] gives an idea of nonlinearintegration of all features by covariance matrices and thesupported implementation uses color gradient and spatialinformation Our second contribution is the modification ofthat model and a proposal of using only color with spatial

information in nonlinearly integrated manner using samecovariance matrices For our proposed implementation(seeFigure 4) we modified the features used in [17] and comparethe saliencymap with Erdemrsquos [17] model in Table 6 using thesAUC score Two databases Toronto [16] and Kootstra [34]are used for simulations and the results indicate that by usingonly color with spatial information we can get better sAUCscore than integrating all features using covariance matricesIn Table 6 the difference in sAUC score is quite visible onboth datasets One possible reason of this improvement maybe that the correlation among different features like color andorientation is different and thus using a covariance basedrepresentation does not capture the underlying informationstructure in an efficient way as compared to when only colorinformation is used

One possible argument against our usage of only colorinformation however can be that without any gradient ororientation information a saliency model will fail to detectmany salient regions This argument can also supported bynature since neurons tuned to orientation in an image areknown to contribute to saliency computation [53] In ourmodel this issue is addressed by the sparse representationwhere the adaptive basis same as basis shown in Figure 8(a)is Gabor like filters with edge like structure and thus thesebases efficiently capture orientation information from theimage which complements our color information in nonlin-ear representation

Finally the sAUC score of dual representation Table 3shows that we achieve better eye fixation prediction thantreating both representations separately as shown in Tables5 and 6We believe that such improvement is due to the com-plementary behavior of both techniques since a combinedapproach better represents image contents with high fidelityand thus that in turn improves saliency detection Lastly for

The Scientific World Journal 15

illustration and visual comparison we present some saliencymaps produced by our algorithm along with other models inFigure 9

6 Conclusion

This paper shows that dual feature representationmanages torobustly capture image information which can be used in acenter surround operation to compute saliencyWe show thata CSD on adaptive sparse basis gives better results than a fixsparse basis representation In nonlinear representation weshow that nonlinearly integrated color channels with spatialinformation better capture underlying data structure andthus a CSD on such representation gives good results Finallywe consider both representations as complementary and thusa fused saliencymap not only give good results on human eyefixations but also detect salient objects with high accuracy

In future wewill incorporate some top-downmechanismto better imitative human saliency computation capabilitiesbased on learning and experience Another possible exten-sion of the existing work is to test dynamic scenes video byincorporating additional motion information in the currentscheme

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the national Natural Sci-ence Foundation of China (61175096) and Specialized Fundfor Joint Building Program of Beijing Municipal EducationCommission

References

[1] C Siagian and L Itti ldquoBiologically inspired mobile robot visionlocalizationrdquo IEEE Transactions on Robotics vol 25 no 4 pp861ndash873 2009

[2] J Li M D Levine X An X Xu and H He ldquoVisual saliencybased on scale-space analysis in the frequency domainrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 4 pp 996ndash1010 2012

[3] Y Su Q Zhao L Zhao and D Gu ldquoAbrupt motion trackingusing a visual saliency embedded particle filterrdquo Pattern Recog-nition vol 47 no 5 pp 1826ndash1834 2014

[4] C Guo Q Ma and L Zhang ldquoSpatio-temporal saliency detec-tion using phase spectrum of quaternion fourier transformrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo08) pp 1ndash8 June 2008

[5] X Hou and L Zhang ldquoThumbnail generation based on globalsaliencyrdquo in Advances in Cognitive NeurodynamicsmdashICCN2007 pp 999ndash1003 Springer Amsterdam The Netherlands2007

[6] A Borji and L Itti ldquoState-of-the-art in visual attention mod-elingrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 35 no 1 pp 185ndash207 2013

[7] L Itti C Koch and E Niebur ldquoAmodel of saliency-based visualattention for rapid scene analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 20 no 11 pp 1254ndash12591998

[8] AM Treisman ldquoFeature integration theoryrdquoCognitive Psychol-ogy vol 12 no 1 pp 97ndash136 1980

[9] J K Tsotsos S M Culhane W Y Kei Wai Y Lai N Davisand F Nuflo ldquoModeling visual attention via selective tuningrdquoArtificial Intelligence vol 78 no 1-2 pp 507ndash545 1995

[10] R Rae Gestikbasierte Mensch-Maschine-Kommunikation aufder grundlage visueller aufmerksamkeit und adaptivitat [PhDthesis] Universitat Bielefeld 2000

[11] D Parkhurst K Law and E Niebur ldquoModeling the role ofsalience in the allocation of overt visual attentionrdquo VisionResearch vol 42 no 1 pp 107ndash123 2002

[12] J Li Y Tian T Huang and W Gao ldquoProbabilistic multi-tasklearning for visual saliency estimation in videordquo InternationalJournal of Computer Vision vol 90 no 2 pp 150ndash165 2010

[13] MCerf J HarelW Einhauser andCKoch ldquoPredicting humangaze using low-level saliency combined with face detectionrdquo inAdvances in Neural Information Processing Systems vol 20 pp241ndash248 2007

[14] A Torralba ldquoModeling global scene factors in attentionrdquoJournal of the Optical Society of America A Optics and ImageScience and Vision vol 20 no 7 pp 1407ndash1418 2003

[15] A Oliva and A Torralba ldquoModeling the shape of the scenea holistic representation of the spatial enveloperdquo InternationalJournal of Computer Vision vol 42 no 3 pp 145ndash175 2001

[16] N D B Bruce and J K Tsotsos ldquoSaliency attention and visualsearch an information theoretic approachrdquo Journal of Visionvol 9 no 3 article 5 2009

[17] E Erdem and A Erdem ldquoVisual saliency estimation by nonlin-early integrating features using region covariancesrdquo Journal ofVision vol 13 no 4 article 11 2013

[18] J H van Hateren ldquoReal and optimal neural images in earlyvisionrdquo Nature vol 360 no 6399 pp 68ndash70 1992

[19] D J Field ldquoRelations between the statistics of natural imagesand the response properties of cortical cellsrdquo Journal of theOptical Society of America A Optics and Image Science vol 4no 12 pp 2379ndash2394 1987

[20] B A Olshausen and D J Field ldquoNatural image statistics andefficient codingrdquo Network Computation in Neural Systems vol7 no 2 pp 333ndash339 1996

[21] M Kwon G Legge F Fang A Cheong and S He ldquoIdentifyingthe mechanism of adaptation to prolonged contrast reductionrdquoJournal of Vision vol 9 no 8 p 976 2009

[22] X Sun H Yao and R Ji ldquoVisual attention modeling based onshort-term environmental adaptionrdquo Journal of Visual Commu-nication and Image Representation vol 24 no 2 pp 171ndash1802013

[23] A Garcia-Diaz X R Fdez-Vidal X M Pardo and R DosilldquoSaliency from hierarchical adaptation through decorrelationand variance normalizationrdquo Image and Vision Computing vol30 no 1 pp 51ndash64 2012

[24] C E Shannon ldquoAmathematical theory of communicationrdquo BellSystem Technical Journal vol 27 no 3 pp 379ndash423 1948

[25] Y Karklin and M S Lewicki ldquoEmergence of complex cellproperties by learning to generalize in natural scenesrdquo Naturevol 457 no 7225 pp 83ndash86 2009

[26] H J Seo and P Milanfar ldquoStatic and space-time visual saliencydetection by self-resemblancerdquo Journal of Vision vol 9 no 12pp 1ndash27 2009

16 The Scientific World Journal

[27] D J Jobson Z-U Rahman and G AWoodell ldquoProperties andperformance of a centersurround retinexrdquo IEEE Transactionson Image Processing vol 6 no 3 pp 451ndash462 1997

[28] A Guo D Zhao S Liu X Fan and W Gao ldquoVisual attentionbased image quality assessmentrdquo in Proceedings of the 18thIEEE International Conference on Image Processing (ICIP rsquo11) pp3297ndash3300 September 2011

[29] A Borji and L Itti ldquoExploiting local and global patch raritiesfor saliency detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo12) pp 478ndash485 2012

[30] L Duan C Wu J Miao L Qing and Y Fu ldquoVisual saliencydetection by spatially weighted dissimilarityrdquo in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR rsquo11) pp 473ndash480 June 2011

[31] JMWolfe ldquoGuided search 20 a revisedmodel of visual searchrdquoPsychonomic Bulletin amp Review vol 1 no 2 pp 202ndash238 1994

[32] O Le Meur P Le Callet D Barba and DThoreau ldquoA coherentcomputational approach to model bottom-up visual attentionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 28 no 5 pp 802ndash817 2006

[33] A Torralba A Oliva M S Castelhano and J M HendersonldquoContextual guidance of eye movements and attention in real-world scenes the role of global features in object searchrdquoPsychological Review vol 113 no 4 pp 766ndash786 2006

[34] G Kootstra N Bergstrom and D Kragic ldquoUsing symmetry toselect fixation points for segmentationrdquo in Proceedings of the20th International Conference on Pattern Recognition (ICPR rsquo10)pp 3894ndash3897 August 2010

[35] C Li J Xue N Zheng X Lan and Z Tian ldquoSpatio-temporalsaliency perception via hypercomplex frequency spectral con-trastrdquo Sensors vol 13 no 3 pp 3409ndash3431 2013

[36] L Itti and P Baldi ldquoBayesian surprise attracts human attentionrdquoVision Research vol 49 no 10 pp 1295ndash1306 2009

[37] J Harel C Koch and P Perona ldquoGraph-based visual saliencyrdquoin Advances in Neural Information Processing Systems vol 19pp 545ndash552 2006

[38] X Hou and L Zhang ldquoSaliency detection a spectral residualapproachrdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo07)pp 1ndash8 June 2007

[39] B Schauerte and R Stiefelhagen ldquoQuaternion-based spectralsaliency detection for eye fixation predictionrdquo in ComputerVisionmdashECCV 2012 A Fitzgibbon S Lazebnik P Perona YSato and C Schmid Eds Lecture Notes in Computer Sciencepp 116ndash129 2012

[40] W Kienzle F Wichmann B Scholkopf and M Franz ANonparametric Approach to Bottom-Up Visual Saliency 2007

[41] J Tilke K Ehinger F Durand and A Torralba ldquoLearningto predict where humans lookrdquo in Proceedings of the 12thInternational Conference on Computer Vision (ICCV rsquo09) pp2106ndash2113 October 2009

[42] W Wang Y Wang Q Huang and W Gao ldquoMeasuring visualsaliency by site entropy raterdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 2368ndash2375 June 2010

[43] O Tuzel F Porikli and P Meer ldquoRegion covariance afast descriptor for detection and classificationrdquo in ComputerVisionmdashECCV 2006 vol 3952 of Lecture Notes in ComputerScience pp 589ndash600 2006

[44] httpssitesgooglecomsitesparsenonlinearsaliencymodelhomedownloads

[45] A Hyvarinen and E Oja ldquoA fast fixed-point algorithm forindependent component analysisrdquo Neural Computation vol 9no 7 pp 1483ndash1492 1997

[46] X Hou and L Zhang ldquoDynamic visual attention searching forcoding length incrementsrdquo in Advances in Neural InformationProcessing Systems vol 21 pp 681ndash688 2008

[47] L Zhang M H Tong T K Marks H Shan and G WCottrell ldquoSUN a Bayesian framework for saliency using naturalstatisticsrdquo Journal of Vision vol 8 no 7 article 32 pp 1ndash202008

[48] J Zhang and S Sclaroff ldquoSaliency detection a boolean mapapproachrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV rsquo13) 2013

[49] X Hou J Harel and C Koch ldquoImage signature highlightingsparse salient regionsrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 34 no 1 pp 194ndash201 2012

[50] E Rahtu J Kannala M Salo and J Heikkila ldquoSegmentingsalient objects from images and videosrdquo in Computer VisionmdashECCV 2010 vol 6315 of Lecture Notes in Computer Science pp366ndash379 2010

[51] M-M Cheng G-X Zhang N J Mitra X Huang and S-M Hu ldquoGlobal contrast based salient region detectionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo11) pp 409ndash416 June 2011

[52] T Veit J-P Tarel P Nicolle and P Charbonnier ldquoEvaluationof road marking feature extractionrdquo in Proceedings of the11th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo08) pp 174ndash181 December 2008

[53] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model of simple receptive fields in catstriate cortexrdquo Journal of Neurophysiology vol 58 no 6 pp1233ndash1258 1987

Page 15: Saliency Detection Using Sparse and Nonlinear …...Saliency Detection Using Sparse and Nonlinear Feature Representation ShahzadAnwar,1 QingjieZhao,1 MuhammadFarhanManzoor,2 andSaqibIshaqKhan3

The Scientific World Journal 15

illustration and visual comparison we present some saliencymaps produced by our algorithm along with other models inFigure 9

6 Conclusion

This paper shows that dual feature representationmanages torobustly capture image information which can be used in acenter surround operation to compute saliencyWe show thata CSD on adaptive sparse basis gives better results than a fixsparse basis representation In nonlinear representation weshow that nonlinearly integrated color channels with spatialinformation better capture underlying data structure andthus a CSD on such representation gives good results Finallywe consider both representations as complementary and thusa fused saliencymap not only give good results on human eyefixations but also detect salient objects with high accuracy

In future wewill incorporate some top-downmechanismto better imitative human saliency computation capabilitiesbased on learning and experience Another possible exten-sion of the existing work is to test dynamic scenes video byincorporating additional motion information in the currentscheme

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the national Natural Sci-ence Foundation of China (61175096) and Specialized Fundfor Joint Building Program of Beijing Municipal EducationCommission

References

[1] C Siagian and L Itti ldquoBiologically inspired mobile robot visionlocalizationrdquo IEEE Transactions on Robotics vol 25 no 4 pp861ndash873 2009

[2] J Li M D Levine X An X Xu and H He ldquoVisual saliencybased on scale-space analysis in the frequency domainrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 4 pp 996ndash1010 2012

[3] Y Su Q Zhao L Zhao and D Gu ldquoAbrupt motion trackingusing a visual saliency embedded particle filterrdquo Pattern Recog-nition vol 47 no 5 pp 1826ndash1834 2014

[4] C Guo Q Ma and L Zhang ldquoSpatio-temporal saliency detec-tion using phase spectrum of quaternion fourier transformrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo08) pp 1ndash8 June 2008

[5] X Hou and L Zhang ldquoThumbnail generation based on globalsaliencyrdquo in Advances in Cognitive NeurodynamicsmdashICCN2007 pp 999ndash1003 Springer Amsterdam The Netherlands2007

[6] A Borji and L Itti ldquoState-of-the-art in visual attention mod-elingrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 35 no 1 pp 185ndash207 2013

[7] L Itti C Koch and E Niebur ldquoAmodel of saliency-based visualattention for rapid scene analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 20 no 11 pp 1254ndash12591998

[8] AM Treisman ldquoFeature integration theoryrdquoCognitive Psychol-ogy vol 12 no 1 pp 97ndash136 1980

[9] J K Tsotsos S M Culhane W Y Kei Wai Y Lai N Davisand F Nuflo ldquoModeling visual attention via selective tuningrdquoArtificial Intelligence vol 78 no 1-2 pp 507ndash545 1995

[10] R Rae Gestikbasierte Mensch-Maschine-Kommunikation aufder grundlage visueller aufmerksamkeit und adaptivitat [PhDthesis] Universitat Bielefeld 2000

[11] D Parkhurst K Law and E Niebur ldquoModeling the role ofsalience in the allocation of overt visual attentionrdquo VisionResearch vol 42 no 1 pp 107ndash123 2002

[12] J Li Y Tian T Huang and W Gao ldquoProbabilistic multi-tasklearning for visual saliency estimation in videordquo InternationalJournal of Computer Vision vol 90 no 2 pp 150ndash165 2010

[13] MCerf J HarelW Einhauser andCKoch ldquoPredicting humangaze using low-level saliency combined with face detectionrdquo inAdvances in Neural Information Processing Systems vol 20 pp241ndash248 2007

[14] A Torralba ldquoModeling global scene factors in attentionrdquoJournal of the Optical Society of America A Optics and ImageScience and Vision vol 20 no 7 pp 1407ndash1418 2003

[15] A Oliva and A Torralba ldquoModeling the shape of the scenea holistic representation of the spatial enveloperdquo InternationalJournal of Computer Vision vol 42 no 3 pp 145ndash175 2001

[16] N D B Bruce and J K Tsotsos ldquoSaliency attention and visualsearch an information theoretic approachrdquo Journal of Visionvol 9 no 3 article 5 2009

[17] E Erdem and A Erdem ldquoVisual saliency estimation by nonlin-early integrating features using region covariancesrdquo Journal ofVision vol 13 no 4 article 11 2013

[18] J H van Hateren ldquoReal and optimal neural images in earlyvisionrdquo Nature vol 360 no 6399 pp 68ndash70 1992

[19] D J Field ldquoRelations between the statistics of natural imagesand the response properties of cortical cellsrdquo Journal of theOptical Society of America A Optics and Image Science vol 4no 12 pp 2379ndash2394 1987

[20] B A Olshausen and D J Field ldquoNatural image statistics andefficient codingrdquo Network Computation in Neural Systems vol7 no 2 pp 333ndash339 1996

[21] M Kwon G Legge F Fang A Cheong and S He ldquoIdentifyingthe mechanism of adaptation to prolonged contrast reductionrdquoJournal of Vision vol 9 no 8 p 976 2009

[22] X Sun H Yao and R Ji ldquoVisual attention modeling based onshort-term environmental adaptionrdquo Journal of Visual Commu-nication and Image Representation vol 24 no 2 pp 171ndash1802013

[23] A Garcia-Diaz X R Fdez-Vidal X M Pardo and R DosilldquoSaliency from hierarchical adaptation through decorrelationand variance normalizationrdquo Image and Vision Computing vol30 no 1 pp 51ndash64 2012

[24] C E Shannon ldquoAmathematical theory of communicationrdquo BellSystem Technical Journal vol 27 no 3 pp 379ndash423 1948

[25] Y Karklin and M S Lewicki ldquoEmergence of complex cellproperties by learning to generalize in natural scenesrdquo Naturevol 457 no 7225 pp 83ndash86 2009

[26] H J Seo and P Milanfar ldquoStatic and space-time visual saliencydetection by self-resemblancerdquo Journal of Vision vol 9 no 12pp 1ndash27 2009

16 The Scientific World Journal

[27] D J Jobson Z-U Rahman and G AWoodell ldquoProperties andperformance of a centersurround retinexrdquo IEEE Transactionson Image Processing vol 6 no 3 pp 451ndash462 1997

[28] A Guo D Zhao S Liu X Fan and W Gao ldquoVisual attentionbased image quality assessmentrdquo in Proceedings of the 18thIEEE International Conference on Image Processing (ICIP rsquo11) pp3297ndash3300 September 2011

[29] A Borji and L Itti ldquoExploiting local and global patch raritiesfor saliency detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo12) pp 478ndash485 2012

[30] L Duan C Wu J Miao L Qing and Y Fu ldquoVisual saliencydetection by spatially weighted dissimilarityrdquo in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR rsquo11) pp 473ndash480 June 2011

[31] JMWolfe ldquoGuided search 20 a revisedmodel of visual searchrdquoPsychonomic Bulletin amp Review vol 1 no 2 pp 202ndash238 1994

[32] O Le Meur P Le Callet D Barba and DThoreau ldquoA coherentcomputational approach to model bottom-up visual attentionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 28 no 5 pp 802ndash817 2006

[33] A Torralba A Oliva M S Castelhano and J M HendersonldquoContextual guidance of eye movements and attention in real-world scenes the role of global features in object searchrdquoPsychological Review vol 113 no 4 pp 766ndash786 2006

[34] G Kootstra N Bergstrom and D Kragic ldquoUsing symmetry toselect fixation points for segmentationrdquo in Proceedings of the20th International Conference on Pattern Recognition (ICPR rsquo10)pp 3894ndash3897 August 2010

[35] C Li J Xue N Zheng X Lan and Z Tian ldquoSpatio-temporalsaliency perception via hypercomplex frequency spectral con-trastrdquo Sensors vol 13 no 3 pp 3409ndash3431 2013

[36] L Itti and P Baldi ldquoBayesian surprise attracts human attentionrdquoVision Research vol 49 no 10 pp 1295ndash1306 2009

[37] J Harel C Koch and P Perona ldquoGraph-based visual saliencyrdquoin Advances in Neural Information Processing Systems vol 19pp 545ndash552 2006

[38] X Hou and L Zhang ldquoSaliency detection a spectral residualapproachrdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo07)pp 1ndash8 June 2007

[39] B Schauerte and R Stiefelhagen ldquoQuaternion-based spectralsaliency detection for eye fixation predictionrdquo in ComputerVisionmdashECCV 2012 A Fitzgibbon S Lazebnik P Perona YSato and C Schmid Eds Lecture Notes in Computer Sciencepp 116ndash129 2012

[40] W Kienzle F Wichmann B Scholkopf and M Franz ANonparametric Approach to Bottom-Up Visual Saliency 2007

[41] J Tilke K Ehinger F Durand and A Torralba ldquoLearningto predict where humans lookrdquo in Proceedings of the 12thInternational Conference on Computer Vision (ICCV rsquo09) pp2106ndash2113 October 2009

[42] W Wang Y Wang Q Huang and W Gao ldquoMeasuring visualsaliency by site entropy raterdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 2368ndash2375 June 2010

[43] O Tuzel F Porikli and P Meer ldquoRegion covariance afast descriptor for detection and classificationrdquo in ComputerVisionmdashECCV 2006 vol 3952 of Lecture Notes in ComputerScience pp 589ndash600 2006

[44] httpssitesgooglecomsitesparsenonlinearsaliencymodelhomedownloads

[45] A Hyvarinen and E Oja ldquoA fast fixed-point algorithm forindependent component analysisrdquo Neural Computation vol 9no 7 pp 1483ndash1492 1997

[46] X Hou and L Zhang ldquoDynamic visual attention searching forcoding length incrementsrdquo in Advances in Neural InformationProcessing Systems vol 21 pp 681ndash688 2008

[47] L Zhang M H Tong T K Marks H Shan and G WCottrell ldquoSUN a Bayesian framework for saliency using naturalstatisticsrdquo Journal of Vision vol 8 no 7 article 32 pp 1ndash202008

[48] J Zhang and S Sclaroff ldquoSaliency detection a boolean mapapproachrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV rsquo13) 2013

[49] X Hou J Harel and C Koch ldquoImage signature highlightingsparse salient regionsrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 34 no 1 pp 194ndash201 2012

[50] E Rahtu J Kannala M Salo and J Heikkila ldquoSegmentingsalient objects from images and videosrdquo in Computer VisionmdashECCV 2010 vol 6315 of Lecture Notes in Computer Science pp366ndash379 2010

[51] M-M Cheng G-X Zhang N J Mitra X Huang and S-M Hu ldquoGlobal contrast based salient region detectionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo11) pp 409ndash416 June 2011

[52] T Veit J-P Tarel P Nicolle and P Charbonnier ldquoEvaluationof road marking feature extractionrdquo in Proceedings of the11th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo08) pp 174ndash181 December 2008

[53] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model of simple receptive fields in catstriate cortexrdquo Journal of Neurophysiology vol 58 no 6 pp1233ndash1258 1987

Page 16: Saliency Detection Using Sparse and Nonlinear …...Saliency Detection Using Sparse and Nonlinear Feature Representation ShahzadAnwar,1 QingjieZhao,1 MuhammadFarhanManzoor,2 andSaqibIshaqKhan3

16 The Scientific World Journal

[27] D J Jobson Z-U Rahman and G AWoodell ldquoProperties andperformance of a centersurround retinexrdquo IEEE Transactionson Image Processing vol 6 no 3 pp 451ndash462 1997

[28] A Guo D Zhao S Liu X Fan and W Gao ldquoVisual attentionbased image quality assessmentrdquo in Proceedings of the 18thIEEE International Conference on Image Processing (ICIP rsquo11) pp3297ndash3300 September 2011

[29] A Borji and L Itti ldquoExploiting local and global patch raritiesfor saliency detectionrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo12) pp 478ndash485 2012

[30] L Duan C Wu J Miao L Qing and Y Fu ldquoVisual saliencydetection by spatially weighted dissimilarityrdquo in Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR rsquo11) pp 473ndash480 June 2011

[31] JMWolfe ldquoGuided search 20 a revisedmodel of visual searchrdquoPsychonomic Bulletin amp Review vol 1 no 2 pp 202ndash238 1994

[32] O Le Meur P Le Callet D Barba and DThoreau ldquoA coherentcomputational approach to model bottom-up visual attentionrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 28 no 5 pp 802ndash817 2006

[33] A Torralba A Oliva M S Castelhano and J M HendersonldquoContextual guidance of eye movements and attention in real-world scenes the role of global features in object searchrdquoPsychological Review vol 113 no 4 pp 766ndash786 2006

[34] G Kootstra N Bergstrom and D Kragic ldquoUsing symmetry toselect fixation points for segmentationrdquo in Proceedings of the20th International Conference on Pattern Recognition (ICPR rsquo10)pp 3894ndash3897 August 2010

[35] C Li J Xue N Zheng X Lan and Z Tian ldquoSpatio-temporalsaliency perception via hypercomplex frequency spectral con-trastrdquo Sensors vol 13 no 3 pp 3409ndash3431 2013

[36] L Itti and P Baldi ldquoBayesian surprise attracts human attentionrdquoVision Research vol 49 no 10 pp 1295ndash1306 2009

[37] J Harel C Koch and P Perona ldquoGraph-based visual saliencyrdquoin Advances in Neural Information Processing Systems vol 19pp 545ndash552 2006

[38] X Hou and L Zhang ldquoSaliency detection a spectral residualapproachrdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo07)pp 1ndash8 June 2007

[39] B Schauerte and R Stiefelhagen ldquoQuaternion-based spectralsaliency detection for eye fixation predictionrdquo in ComputerVisionmdashECCV 2012 A Fitzgibbon S Lazebnik P Perona YSato and C Schmid Eds Lecture Notes in Computer Sciencepp 116ndash129 2012

[40] W Kienzle F Wichmann B Scholkopf and M Franz ANonparametric Approach to Bottom-Up Visual Saliency 2007

[41] J Tilke K Ehinger F Durand and A Torralba ldquoLearningto predict where humans lookrdquo in Proceedings of the 12thInternational Conference on Computer Vision (ICCV rsquo09) pp2106ndash2113 October 2009

[42] W Wang Y Wang Q Huang and W Gao ldquoMeasuring visualsaliency by site entropy raterdquo in Proceedings of the IEEEComputer Society Conference on Computer Vision and PatternRecognition (CVPR rsquo10) pp 2368ndash2375 June 2010

[43] O Tuzel F Porikli and P Meer ldquoRegion covariance afast descriptor for detection and classificationrdquo in ComputerVisionmdashECCV 2006 vol 3952 of Lecture Notes in ComputerScience pp 589ndash600 2006

[44] httpssitesgooglecomsitesparsenonlinearsaliencymodelhomedownloads

[45] A Hyvarinen and E Oja ldquoA fast fixed-point algorithm forindependent component analysisrdquo Neural Computation vol 9no 7 pp 1483ndash1492 1997

[46] X Hou and L Zhang ldquoDynamic visual attention searching forcoding length incrementsrdquo in Advances in Neural InformationProcessing Systems vol 21 pp 681ndash688 2008

[47] L Zhang M H Tong T K Marks H Shan and G WCottrell ldquoSUN a Bayesian framework for saliency using naturalstatisticsrdquo Journal of Vision vol 8 no 7 article 32 pp 1ndash202008

[48] J Zhang and S Sclaroff ldquoSaliency detection a boolean mapapproachrdquo in Proceedings of the IEEE International Conferenceon Computer Vision (ICCV rsquo13) 2013

[49] X Hou J Harel and C Koch ldquoImage signature highlightingsparse salient regionsrdquo IEEE Transactions on Pattern Analysisand Machine Intelligence vol 34 no 1 pp 194ndash201 2012

[50] E Rahtu J Kannala M Salo and J Heikkila ldquoSegmentingsalient objects from images and videosrdquo in Computer VisionmdashECCV 2010 vol 6315 of Lecture Notes in Computer Science pp366ndash379 2010

[51] M-M Cheng G-X Zhang N J Mitra X Huang and S-M Hu ldquoGlobal contrast based salient region detectionrdquo inProceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR rsquo11) pp 409ndash416 June 2011

[52] T Veit J-P Tarel P Nicolle and P Charbonnier ldquoEvaluationof road marking feature extractionrdquo in Proceedings of the11th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo08) pp 174ndash181 December 2008

[53] J P Jones and L A Palmer ldquoAn evaluation of the two-dimensional Gabor filter model of simple receptive fields in catstriate cortexrdquo Journal of Neurophysiology vol 58 no 6 pp1233ndash1258 1987