Published on

17-Mar-2017View

212Download

0

Transcript

Sparse feature selection based on L2,1/2-matrix norm for webimage annotation

Caijuan Shi a,b,c,n, Qiuqi Ruan a,c, Song Guo a,c, Yi Tian a,c

a Institute of Information Science, Beijing Jiaotong University, Beijing 100044, Chinab College of Information Engineering, Hebei United University, Tangshan 063009, Chinac Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing 100044, China

a r t i c l e i n f o

Article history:Received 27 October 2013Received in revised form19 March 2014Accepted 15 September 2014Communicated by Jinhui Tang

Keywords:Sparse feature selectionl2,1/2-matrix normWeb image annotationShared subspace learning

a b s t r a c t

Web image annotation based on sparse feature selection has received an increasing amount of interest inrecent years. However, existing sparse feature selection methods become less effective and efficient. Thisraises an urgent need to develop good sparse feature selection methods to improve web imageannotation performance. In this paper we propose a novel sparse feature selection framework for webimage annotation, namely Sparse Feature Selection based on L2,1/2-matrix norm (SFSL). SFSL can selectmore sparse and more discriminative features by exploiting the l2,1/2-matrix norm with shared subspacelearning, and then improve the web image annotation performance. We proposed an efficient iterativealgorithm to optimize the objective function. Extensive experiments are performed on two web imagedatasets. The experimental results have validated that our method outperforms the state-of-the-artalgorithms and suits for large-scale web image annotation.

& 2014 Elsevier B.V. All rights reserved.

1. Introduction

Recent years many photo-sharing websites, such as Flickr andPicasa have increased continuously and the web images haveshown the explosive growth. Web image annotation has becomea critical research issue for image search and index [13]. As animportant technique for image annotation, feature selection playsan important role in improving annotation performance. Con-fronting the large number of web images, classical feature selec-tion algorithms such as Fisher Score [4] and ReliefF [5] become lesseffective and efficient because they evaluate the importance ofeach feature individually. Due to the efficiency and effectiveness,sparse feature selection has received an increasing amount ofinterest in recent years for web image annotation [69].

During the last decade, several endeavors have been madetowards this research topic. The most well-known sparse model isl1-norm (lasso) [10], which has been applied to select sparsefeatures for image classification [11] and multi-cluster data [12].Sparse feature selection based on l1-norm can select the sparsefeatures for its computational convenience and efficiency, butthese selected features are not sufficiently sparse sometimes

resulting in the higher computational cost. Much works [1316]have extended the l1-norm to the lp-norm. Guo et al. [17] haveused lp-norm regularization for robust face recognition. In [18,19],Xu et al. have proposed l1/2-norm regularization and have pointedout l1/2-norm regularization has the best performance among alllp-norm regularization with p in (0, 1). However, all above methodsselect features one by one and neglect the useful information ofthe correlation between different features leading to the decline ofannotation performance. In [20], Nie et al. have introduced a jointl2,1-norm minimization on both loss function and regularizationfor feature selection, which can realize sparse features selectionacross all data points. In [8,21], Ma et al. have applied l2,1-norm totheir sparse feature selection model for image annotation. In[22,23], Li et al. have applied l2,1-norm to their unsupervisedfeature selection using nonnegative spectral analysis. Because lp-norm (0opo1) has better sparsity than l1-norm, Wang et al. [24]recently have proposed an idea to extend l2,1-norm to l2,p-matrixnorm (0opr1) so as to select joint, more sparse features, at thesame time l2,p-matrix norm has better robustness than l2,1-norm.In this paper we propose a sparse feature selection method basedupon l2,1/2-matrix norm to obtain more sparse, robust features forweb image annotation.

Usually, each web image always associated with severalsemantic concepts, so the web image annotation is actually amulti-label classification problem. This intrinsic characteristic ofweb images makes web image annotation more complicated.

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/neucom

Neurocomputing

http://dx.doi.org/10.1016/j.neucom.2014.09.0230925-2312/& 2014 Elsevier B.V. All rights reserved.

n Corresponding author.E-mail addresses: shicaijuan2011@gmail.com (C. Shi),

qqruan@center.njtu.edu.cn (Q. Ruan), 07112071@bjtu.edu.cn (S. Guo),11112061@bjtu.edu.cn (Y. Tian).

Please cite this article as: C. Shi, et al., Sparse feature selection based on L2,1/2-matrix norm for web image annotation, Neurocomputing(2014), http://dx.doi.org/10.1016/j.neucom.2014.09.023i

Neurocomputing ()

www.sciencedirect.com/science/journal/09252312www.elsevier.com/locate/neucomhttp://dx.doi.org/10.1016/j.neucom.2014.09.023http://dx.doi.org/10.1016/j.neucom.2014.09.023http://dx.doi.org/10.1016/j.neucom.2014.09.023mailto:shicaijuan2011@gmail.commailto:qqruan@center.njtu.edu.cnmailto:07112071@bjtu.edu.cnmailto:11112061@bjtu.edu.cnhttp://dx.doi.org/10.1016/j.neucom.2014.09.023http://dx.doi.org/10.1016/j.neucom.2014.09.023http://dx.doi.org/10.1016/j.neucom.2014.09.023http://dx.doi.org/10.1016/j.neucom.2014.09.023

Ando et al. have assumed that there is a shared subspace betweenlabels [25] and the shared subspace can utilize the label correla-tion to improve the performance of multi-label classification. In[22], Li et al. have exploited the latent structure shared by differentfeatures to predict the cluster indicators. Authors in Ref. [2830]have also introduced some other related works of shared subspacelearning. In this paper, we apply the shared subspace learning tosparse feature selection for exploiting the relational information offeatures to enhance the web image annotation performance.

In this paper, we proposed a new sparse feature selectionframework for web image annotation and name it Sparse FeatureSelection based on L2,1/2-matrix norm (SFSL). SFSL can select moresparse and discriminative features by exploiting l2,1/2-matrix normmodel with shared subspace learning. We have tested the perfor-mance of our algorithm on two real-world web image datasetsNUS-WIDE [26] and MSRA MM 2.0 [27]. The results of theexperiments demonstrate that our algorithm SFSL outperformsother existing sparse feature selection algorithms for web imageannotation.

The main contributions of this paper are summarized as follows:

Spare Feature Selection based on L2,1/2-matrix norm (SFSL) isproposed, which can select the more sparse and discriminativefeatures with good robustness based upon l2,1/2-matrix normwith shared subspace learning;

We devise a novel effective algorithm for optimizing theobjective function of the SFSL and prove the convergence ofthe algorithm;

We conduct several experiments on two web image datasetsand the results demonstrate the effectiveness and efficiency ofour method.

This paper is organized as follows. We briefly introduce therelated works on sparse feature selection model and sharedsubspace learning in Section 2. Then we describe the proposedSFSL algorithm and the optimization followed by the convergenceanalysis in Section 3. We conduct extensive experiments toevaluate the effectiveness and efficiency of our method for webimage annotation in Section 4 and followed by the conclusion inSection 5.

2. Related work

In this section, we discuss several sparse feature selectionmodels, especially the l2,p-matrix norm model. Besides, we alsobriefly review on shared subspace learning.

2.1. Sparse feature selection model

Compared with traditional feature selection algorithms, sparsefeature selection model can select the most discriminative featuresand simultaneously reduce the computational cost based ondifferent sparse models. Here we introduce some sparse featureselection models which are based on l1-norm, l1/2-norm, l2,1-normand l2,p-matrix norm respectively.

Denote X x1; x2;; xn T as the feature matrix of trainingimages, where xiAR

d1r irn is the ith datum and n is thenumber of the training images. Y y1; y2;; yn

TAf0;1gncis the

label matrix of training images where c is the number of classesand yiAR

c1r irn is the ith label vector.Let GARdcbe the projection matrix. Here we apply supervised

learning algorithm into sparse feature selection model to learn theprojection matrix G. A generally principled framework to obtain G

is to minimize the following regularized error

minG

lossGTX;YRG 1

where lossU is the loss function and RG is the regularizationwith as its regularization parameter.

2.2. 1 l1-norm model

Though the l0-norm theoretically can obtain the sparsestsolution, it has been proven to be an NP-hard selection problem.In practice, the l1-norm is usually used to reformulate sparsefeature selection as a convex problem. We use the traditional leastsquare regression as the loss function and l1-norm as the regular-ization to solve the optimization problem in (1), and then theprojection matrix GARdccan be obtained as follow

minG

XTGY22G1 2

2.3. 2 l1/2-norm model

Many works have extended the l1-norm regularization to thelp-norm (0opo1) regularization because the solution of thelp(0opo1) regularization is more sparse than that of the l1regularization. Xu et al. [18,19] have proposed the l1/2-normregularization is most sparse with the best performance when pbelongs to (0, 1). Then the sparse feature selection model with thel1/2-norm regularization can be written as follow

minG

XTGY22G1=2 3

2.4. 3 l2,1-norm model

In [20] Nie et al. have introduced a joint l2,1-norm minimizationon both loss function and regularization for feature selection,which can realize sparse features selection across all data points.Moreover, the l2,1-norm minimization on both loss function andregularization can overcome the sensitiveness of the square-normresidual. The optimization problem with the l2,1-norm is

minG

XTGY2;1G2;1 4

2.5. 4 l2,p-norm model

Because l2,1-norm is constructed on the convex l1-norm framework,it has not better sparsity. Considering the sparsity of lp-norm, Wanget al. [24] have extended the l2,1-norm to l2,p-matrix norm model formore effective sparse feature selection with good robustness.

The definition of l2,p-matrix norm is:

G2;p d

i 1gip21=p pA 0;1 5

The spare feature selection model based on l2,p-matrix normcan be written as

minG

XTGYp2;pGp2;p 6

When p1, the l2,p-matrix norm is reduced to the case of l2,1-norm. When p is belongs to (0, 1), the l2,p (0opo1) matrix normbecomes a better sparse model than l2,1-norm because lp-norm canfind sparser solution than l1-norm [24].

For any p belongs to (0, 1), the noise magnitude of distantoutlier in (6) is no more than that in (4). Thus the model (6) isexpected to be more robust than model (4) [24].

When p1, the l2,p-matrix norm i.e. l2,1-norm is convex. But whenp is belongs to (0, 1), because lp-norm is neither convex nor Lipschitz

C. Shi et al. / Neurocomputing () 2

Please cite this article as: C. Shi, et al., Sparse feature selection based on L2,1/2-matrix norm for web image annotation, Neurocomputing(2014), http://dx.doi.org/10.1016/j.neucom.2014.09.023i

http://dx.doi.org/10.1016/j.neucom.2014.09.023http://dx.doi.org/10.1016/j.neucom.2014.09.023http://dx.doi.org/10.1016/j.neucom.2014.09.023

continuous, l2,p-matrix pseudo norm is not convex or Lipschitzcontinuous. Wang et al. have presented a unified algorithm to solvethe involved l2,p-matrix norm for all p belongs to (0, 1).

2.6. Shared subspace learning

In multi-label learning problems, Ando et al. have assumed thatthere is a shared subspace between labels [25]. Similarly, in featureselection problems, there is a shared space between features. Therepresentation in the original feature space and the representationin the shared subspace are used simultaneously to predict theconcepts of an image.

Given a feature vector xARdand a prediction function y, theshared subspace learning can be defined as following:

y vTx lTUTx 7where vARdand lARr are the weight vectors and UARdr is theshared subspace for features.

Denote V v1; v2;; vc ARdc, L l1; l2;; lc

ARrc and

UARdr , where r is the dimension of the shared subspace. By

defining G VUL where GARdc, the principled framework ofsupervised learning with shared subspace learning can be written as

minV ;L;U

lossGTX;YRV ; L

s:t: UTU I 8

Note that the imposed constraint UTU I in (8) makes theproblem tractable.

In this paper, we proposed a novel sparse feature selectionframework which can select the most discriminating featureswith good robustness based on l2,1/2-matrix norm model withshared subspace learning to boost the web image annotationperformance.

3. Sparse feature selection based on L2,1/2-matrix norm (SFSL)

In this section, we proposed a novel sparse feature selectionframework for web image annotation, namely Sparse FeatureSelection based on L2,1/2-matrix norm (SFSL). First we introducedthe SFSL formulation and then a novel effective algorithm foroptimizing the objective function is introduced. Finally we presentthe convergence analysis of the SFSL algorithm.

3.1. SFSL formulation

In [18,19], Xu et al. have pointed out lp-norm (0opo1) regular-ization has the better sparsity than l1-norm regularization, especiallyl1/2-norm regularization has the best performance among all with pin (0, 1). l2,1-norm has been widely used in sparse feature selectionbecause it can realize the sparse feature selection across all datapoints, but l2,1-norm is constructed on the convex l1-norm frame-work, whose sparsity is worse than lp-norm (0opo1). In [24], Wanget al. have extended the l2,1-norm to a mixed l2,p (0opr1) matrixnorm for the better sparsity and robustness. Typical p belongs to(0, 1) are tested in l2,p-matrix norm based objective functions andtheir experiment results show that p0.5 obviously outperformsp1. Taking into account the better sparsity and the robustness ofl2,1/2-matrix norm, we apply l2,1/2-matrix norm to our sparse featureselection framework SFSL in our paper.

Set p1/2 in (6), the l2,1/2-matrix norm can be defined as:

G2;1=2 d

i 1gi1=22 2 9

Then the spare feature selection model based on l2,1/2-matrixnorm can be written as

minG

XTGY1=22;1=2G1=22;1=2 10

Note the l2,1/2-matrix norm is non-convex or Lipschitz continuous.Moreover, in order to exploit the relational information

between different features, we introduce the shared subspacelearning into our sparse feature selection framework SFSL.

The basic idea of our method SFSL is to combine the l2,1/2-matrix norm model with the shared subspace learning to realizethe sparse feature selection. By integrating the sparse featureselection model ba...