Sparse feature selection based on L2,1/2-matrix norm for web image annotation

Embed Size (px)

Text of Sparse feature selection based on L2,1/2-matrix norm for web image annotation

  • Sparse feature selection based on L2,1/2-matrix norm for webimage annotation

    Caijuan Shi a,b,c,n, Qiuqi Ruan a,c, Song Guo a,c, Yi Tian a,c

    a Institute of Information Science, Beijing Jiaotong University, Beijing 100044, Chinab College of Information Engineering, Hebei United University, Tangshan 063009, Chinac Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing 100044, China

    a r t i c l e i n f o

    Article history:Received 27 October 2013Received in revised form19 March 2014Accepted 15 September 2014Communicated by Jinhui Tang

    Keywords:Sparse feature selectionl2,1/2-matrix normWeb image annotationShared subspace learning

    a b s t r a c t

    Web image annotation based on sparse feature selection has received an increasing amount of interest inrecent years. However, existing sparse feature selection methods become less effective and efficient. Thisraises an urgent need to develop good sparse feature selection methods to improve web imageannotation performance. In this paper we propose a novel sparse feature selection framework for webimage annotation, namely Sparse Feature Selection based on L2,1/2-matrix norm (SFSL). SFSL can selectmore sparse and more discriminative features by exploiting the l2,1/2-matrix norm with shared subspacelearning, and then improve the web image annotation performance. We proposed an efficient iterativealgorithm to optimize the objective function. Extensive experiments are performed on two web imagedatasets. The experimental results have validated that our method outperforms the state-of-the-artalgorithms and suits for large-scale web image annotation.

    & 2014 Elsevier B.V. All rights reserved.

    1. Introduction

    Recent years many photo-sharing websites, such as Flickr andPicasa have increased continuously and the web images haveshown the explosive growth. Web image annotation has becomea critical research issue for image search and index [13]. As animportant technique for image annotation, feature selection playsan important role in improving annotation performance. Con-fronting the large number of web images, classical feature selec-tion algorithms such as Fisher Score [4] and ReliefF [5] become lesseffective and efficient because they evaluate the importance ofeach feature individually. Due to the efficiency and effectiveness,sparse feature selection has received an increasing amount ofinterest in recent years for web image annotation [69].

    During the last decade, several endeavors have been madetowards this research topic. The most well-known sparse model isl1-norm (lasso) [10], which has been applied to select sparsefeatures for image classification [11] and multi-cluster data [12].Sparse feature selection based on l1-norm can select the sparsefeatures for its computational convenience and efficiency, butthese selected features are not sufficiently sparse sometimes

    resulting in the higher computational cost. Much works [1316]have extended the l1-norm to the lp-norm. Guo et al. [17] haveused lp-norm regularization for robust face recognition. In [18,19],Xu et al. have proposed l1/2-norm regularization and have pointedout l1/2-norm regularization has the best performance among alllp-norm regularization with p in (0, 1). However, all above methodsselect features one by one and neglect the useful information ofthe correlation between different features leading to the decline ofannotation performance. In [20], Nie et al. have introduced a jointl2,1-norm minimization on both loss function and regularizationfor feature selection, which can realize sparse features selectionacross all data points. In [8,21], Ma et al. have applied l2,1-norm totheir sparse feature selection model for image annotation. In[22,23], Li et al. have applied l2,1-norm to their unsupervisedfeature selection using nonnegative spectral analysis. Because lp-norm (0opo1) has better sparsity than l1-norm, Wang et al. [24]recently have proposed an idea to extend l2,1-norm to l2,p-matrixnorm (0opr1) so as to select joint, more sparse features, at thesame time l2,p-matrix norm has better robustness than l2,1-norm.In this paper we propose a sparse feature selection method basedupon l2,1/2-matrix norm to obtain more sparse, robust features forweb image annotation.

    Usually, each web image always associated with severalsemantic concepts, so the web image annotation is actually amulti-label classification problem. This intrinsic characteristic ofweb images makes web image annotation more complicated.

    Contents lists available at ScienceDirect

    journal homepage: www.elsevier.com/locate/neucom

    Neurocomputing

    http://dx.doi.org/10.1016/j.neucom.2014.09.0230925-2312/& 2014 Elsevier B.V. All rights reserved.

    n Corresponding author.E-mail addresses: shicaijuan2011@gmail.com (C. Shi),

    qqruan@center.njtu.edu.cn (Q. Ruan), 07112071@bjtu.edu.cn (S. Guo),11112061@bjtu.edu.cn (Y. Tian).

    Please cite this article as: C. Shi, et al., Sparse feature selection based on L2,1/2-matrix norm for web image annotation, Neurocomputing(2014), http://dx.doi.org/10.1016/j.neucom.2014.09.023i

    Neurocomputing ()

    www.sciencedirect.com/science/journal/09252312www.elsevier.com/locate/neucomhttp://dx.doi.org/10.1016/j.neucom.2014.09.023http://dx.doi.org/10.1016/j.neucom.2014.09.023http://dx.doi.org/10.1016/j.neucom.2014.09.023mailto:shicaijuan2011@gmail.commailto:qqruan@center.njtu.edu.cnmailto:07112071@bjtu.edu.cnmailto:11112061@bjtu.edu.cnhttp://dx.doi.org/10.1016/j.neucom.2014.09.023http://dx.doi.org/10.1016/j.neucom.2014.09.023http://dx.doi.org/10.1016/j.neucom.2014.09.023http://dx.doi.org/10.1016/j.neucom.2014.09.023

  • Ando et al. have assumed that there is a shared subspace betweenlabels [25] and the shared subspace can utilize the label correla-tion to improve the performance of multi-label classification. In[22], Li et al. have exploited the latent structure shared by differentfeatures to predict the cluster indicators. Authors in Ref. [2830]have also introduced some other related works of shared subspacelearning. In this paper, we apply the shared subspace learning tosparse feature selection for exploiting the relational information offeatures to enhance the web image annotation performance.

    In this paper, we proposed a new sparse feature selectionframework for web image annotation and name it Sparse FeatureSelection based on L2,1/2-matrix norm (SFSL). SFSL can select moresparse and discriminative features by exploiting l2,1/2-matrix normmodel with shared subspace learning. We have tested the perfor-mance of our algorithm on two real-world web image datasetsNUS-WIDE [26] and MSRA MM 2.0 [27]. The results of theexperiments demonstrate that our algorithm SFSL outperformsother existing sparse feature selection algorithms for web imageannotation.

    The main contributions of this paper are summarized as follows:

    Spare Feature Selection based on L2,1/2-matrix norm (SFSL) isproposed, which can select the more sparse and discriminativefeatures with good robustness based upon l2,1/2-matrix normwith shared subspace learning;

    We devise a novel effective algorithm for optimizing theobjective function of the SFSL and prove the convergence ofthe algorithm;

    We conduct several experiments on two web image datasetsand the results demonstrate the effectiveness and efficiency ofour method.

    This paper is organized as follows. We briefly introduce therelated works on sparse feature selection model and sharedsubspace learning in Section 2. Then we describe the proposedSFSL algorithm and the optimization followed by the convergenceanalysis in Section 3. We conduct extensive experiments toevaluate the effectiveness and efficiency of our method for webimage annotation in Section 4 and followed by the conclusion inSection 5.

    2. Related work

    In this section, we discuss several sparse feature selectionmodels, especially the l2,p-matrix norm model. Besides, we alsobriefly review on shared subspace learning.

    2.1. Sparse feature selection model

    Compared with traditional feature selection algorithms, sparsefeature selection model can select the most discriminative featuresand simultaneously reduce the computational cost based ondifferent sparse models. Here we introduce some sparse featureselection models which are based on l1-norm, l1/2-norm, l2,1-normand l2,p-matrix norm respectively.

    Denote X x1; x2;; xn T as the feature matrix of trainingimages, where xiAR

    d1r irn is the ith datum and n is thenumber of the training images. Y y1; y2;; yn

    TAf0;1gncis the

    label matrix of training images where c is the number of classesand yiAR

    c1r irn is the ith label vector.Let GARdcbe the projection matrix. Here we apply supervised

    learning algorithm into sparse feature selection model to learn theprojection matrix G. A generally principled framework to obtain G

    is to minimize the following regularized error

    minG

    lossGTX;YRG 1

    where lossU is the loss function and RG is the regularizationwith as its regularization parameter.

    2.2. 1 l1-norm model

    Though the l0-norm theoretically can obtain the sparsestsolution, it has been proven to be an NP-hard selection problem.In practice, the l1-norm is usually used to reformulate sparsefeature selection as a convex problem. We use the traditional leastsquare regression as the loss function and l1-norm as the regular-ization to solve the optimization problem in (1), and then theprojection matrix GARdccan be obtained as follow

    minG

    XTGY22G1 2

    2.3. 2 l1/2-norm model

    Many works have extended the l1-norm regularization to thelp-norm (0opo1) regularization because the solution of thelp(0opo1) regularization is more sparse than that of the l1regularization. Xu et al. [18,19] have proposed the l1/2-normregularization is m