Robust Image Classiﬁcation via Low-Rank Double Dictionary

Robust Image Classification via Low-RankDouble Dictionary Learning

Yi Rong1,2, Shengwu Xiong1(&), and Yongsheng Gao2

1 School of Computer Science and Technology,Wuhan University of Technology, Wuhan, China

[email protected], [email protected] School of Engineering, Griffith University, Brisbane, Australia

[email protected]

Abstract. In recent years, dictionary learning has been widely used in variousimage classification applications. However, how to construct an effective dic-tionary for robust image classification task, in which both the training and thetesting image samples are corrupted, is still an open problem. To address this, wepropose a novel low-rank double dictionary learning (LRD2L) method. Unliketraditional dictionary learning methods, LRD2L simultaneously learns threecomponents from training data: (1) a low-rank class-specific sub-dictionary foreach class to capture the most discriminative features owned by each class, (2) alow-rank class-shared dictionary which models the common patterns shared bydifferent classes and (3) a sparse error container to fit the noises in data. As aresult, the class-specific information, the class-shared information and the noisescontained in data are separated from each other. Therefore, the dictionarieslearned by LRD2L are noiseless, and the class-specific sub-dictionary of eachclass can be more discriminative. Also since the common features across differentclasses, which are essential to the reconstruction of image samples, are preservedin class-shared dictionary, LRD2L has a powerful reconstructive capability fornewly coming testing samples. Experimental results on three public availabledatasets reveal the effectiveness and the superiority of our approach compared tothe state-of-the-art dictionary learning methods.

Keywords: Low-rank dictionary learning � Class-specific dictionary �Class-shared dictionary � Robust image classification

1 Introduction

Image classification has been an active topic in the areas of machine learning andpattern recognition, due to its widely use in real-world applications, such as compu-tational forensics, face recognition and medical diagnosis [1–3]. However, the imagesin the real-world applications are usually corrupted. The image corruptions, such asilluminations and disguises in face images, pose variants and sparse pixel noises inobject images, will make the task of image classification more challenging.

In recent years, sparse representation based methods have led to the state-of-the-artresults in image analysis. The core idea of sparse representation is to encode the test

© Springer International Publishing AG 2017L. Amsaleg et al. (Eds.): MMM 2017, Part I, LNCS 10132, pp. 316–328, 2017.DOI: 10.1007/978-3-319-51811-4_26

image as a linear combination of a few atoms chosen from a given dictionary [4, 5]. Thedictionary plays an important role in the sparse coding process, and how to construct aneffective dictionary is a key issue to the sparse representation based methods. Wrightet al. [6] proposed a sparse representation based classifier (SRC) method, which is basedon the data self-expressive property and seeks the sparse representation of the test imagein terms of all the training samples. SRC has achieved promising results in manyapplications. However, directly using original training samples as dictionary may notfully exploit the discriminative information hidden in the training samples. The clas-sification performance of SRC will significantly degrade, when there are not sufficienttraining data available.

Different from taking the training set as the dictionary, many currently proposedapproaches choose to learn dictionaries from the training samples. K-SVD method [7]was proposed to learn an over-complete dictionary from training samples by solvingthe sparse coding problem and updating dictionary atoms iteratively. Based on theK-SVD method, Zhang and Li [8] proposed a discriminative K-SVD (DK-SVD)method for face recognition by integrating a classification error term into objectivefunction. Jiang et al. [9] proposed a label consistent K-SVD (LC-KSVD) algorithm byintroducing a binary code matrix to force the samples from the same classes to havesimilar representations, which makes the dictionary more discriminative. Ramirez et al.[10] proposed to learn a class-specified sub-dictionary for each class and introduce astructured incoherence regularization term to make sub-dictionaries independent. Byimposing the Fisher criterion on the sparse coding coefficients, Yang et al. [11] pro-posed a Fisher discrimination dictionary learning (FDDL) method to make the codingcoefficients have small within-class scatter and big between-class scatter. Kong andWang [12] proposed to learn a particular dictionary (called particularity) for each class,and a common pattern pool (called commonality) shared by all the classes. In this way,the shared information of different sub-dictionaries are separated, and thus the partic-ularity can be more compact and discriminative. Recently, Gu et al. [13] proposed aprojective dictionary pair learning (DPL) method which simultaneously learns ananalysis dictionary and a synthesis dictionary for pattern classification. Although themethods above can work well under the situation that the image data are noiseless, andeven can handle the corruptions existing in the testing samples. Unfortunately, theycannot generalize well when training and testing samples are both badly corrupted. Iftraining data are corrupted grossly, the corruptions will be introduced into the learneddictionary, resulting in a degraded classification performance.

Recently, much attention has been drawn to the low-rank matrix recovery theory,which has exhibited excellent performance for handling large image corruptions. Byintegrating low-rank regularization on each sub-dictionary, Ma et al. [14] proposed tolearn discriminative low-rank dictionary for sparse representation (DLRD_SR) forrobust image classification. Li et al. [15] proposed a discriminative dictionary learningwith low-rank regularization (D2L2R2) method by applying the Fisher discriminantfunction on the sparse coding coefficients, to make the representation coefficients morediscriminative. Through rank minimization, the sparse noises are separated from thetraining samples, and dictionary atoms are updated to reconstruct the recovered sam-ples, such that the dictionary atoms can be more noiseless. However, due to thelow-rank regularization on each sub-dictionary, certain common information across

Robust Image Classification via LRD2L 317

different classes are removed. Thus, the dictionaries learned by these methods will havea weak representative power for the testing samples.

In this paper, we particularly focus on the robust image classification problem, inwhich both training and testing samples are corrupted. Such problem is challenging,because that the image samples often have large intra-class variants and smallinter-class differences, due to the image corruptions. Inspired by [12], we find that animage in robust image classification task often contains three types of information (seeexamples of face recognition task in Fig. 1): (1) Class-specific information, which arethe most discriminative features owned by one class. (2) Class-shared information,which are the common patterns shared by different classes. These information areessential to the representation of images but do not contribute to the discriminability ofdifferent classes. For example, in face recognition task, face images from differentclasses often share same illumination, expression and disguise variants. Such sharedinformation are not helpful to distinguish different classes, but without them, imageswith similar variants cannot be well represented. (3) Sparse noises, such as pixelcorruptions, which are useless for image classification and often enlarge the intra-classdifferences. If we can separate the class-specific information from the others andconstruct a dictionary to capture such information, this dictionary will be morenoiseless, and the sparse representation on such dictionary will be more discriminative.

Based on above observation, we propose a novel dictionary learning method, calledlow-rank double dictionary learning (LRD2L), to separate these three types of infor-mation from each other. More specifically, given a training samples set with labels, wepropose to learn a low-rank class-specific sub-dictionary for each class, which capturesthe most discriminative features of the corresponding class. Simultaneously, a low-rankclass-shared dictionary is constructed across all classes to represent the commonpatterns shared by different classes. With the help of the class-shared dictionary, theproposed method will have a powerful reconstructive capability for new testing sam-ples. A sparse error term is also introduced to approximate the sparse noises containedin the samples. Since the noises are separated from the samples through the error term,the learned dictionaries can be more noiseless. The main contributions of this paper aresummarized as follows:

(a) Original gray scale images (b) Class-specific information

(c) Class-shared information (d) Sparse noises

Fig. 1. Take face recognition task as example. (b), (c) and (d) are the class-specific information,the class-shared information and the sparse noise in the (a) original images, respectively.

318 Y. Rong et al.

1. Based on the low-rank matrix recovery theory, we propose a novel low-rank doubledictionary learning (LRD2L) method, to address the problem of robust imageclassification, with both the training and the testing data contain corruptions. Byseparating the sparse noises from the training samples through the error term, thelearned class-specific sub-dictionaries and class-shared dictionary can be noiseless.Therefore, the proposed method is robust to extreme noises and gross corruptions.

2. Different from the existing low-rank dictionary learning methods, LRD2L has astrong representative capability for newly coming testing samples. That is becausethe common patterns in the training data are preserved in the class-shared dic-tionary. The testing samples with the similar common patterns will be reasonablyrepresented by the combination of the class-specific and class-shared dictionaries.

3. An alternative optimization algorithm is designed to effectively solve the opti-mization problem of the proposed method. Experimental evaluations on three publicavailable datasets demonstrate the effectiveness and robustness of LRD2L.

The remainder of this paper is organized as follows: In Sect. 2, we introduce thenovel low-rank double dictionary learning (LRD2L) model in detail. Section 3 presentsthe alternative algorithm to solve the corresponding optimization problem of the pro-posed method. Experimental results on three public databases under different experi-mental settings are reported in Sect. 4 to demonstrate the effectiveness of our method.Finally, the paper concludes in Sect. 5.

2 The Proposed LRD2L Method

Given a training sample set Y ¼ ½Y1;Y2; . . .;YC� 2 <d�N , which consists of N trainingsamples from C different classes. Y i 2 <d�Ni denotes the sub-matrix which consists ofall the training samples from the i-th class. d denotes the dimension of samples and Ni

is the number of the training samples from the i-th class, which satisfies N ¼ PCi¼1 Ni.

The goal of LRD2L is to simultaneously construct a class-specific dictionary A ¼A1;A2; . . .;AC½ � 2 <d�mA and a class-shared dictionary B 2 <d�mB , to represent theclass-specific and the class-shared information of the data, respectively. mA and mB

denote the dictionary sizes of A and B. Ai is a sub-dictionary associated with the i-thclass. Therefore, we can represent the training samples from each class as follow:

Y i ¼ AXi þBZi þEi; for i ¼ 1; 2; . . .;C; ð1Þ

where Xi 2 <mA�Ni and Zi 2 <mB�Ni are the representation coefficient matrix of Y i overdictionary A and B, respectively. Ei 2 <d�Ni is an error term to approximate the sparsenoises in Y i, and the sparse error term of the whole training samples can be denoted byE ¼ E1;E2; . . .;EC½ � 2 <d�N . In order to design each term in (1) appropriately, andseparate three types of information in training samples correctly, we analyze theproperties of A, B and E as follows.

– If the image data are clean (i.e., only containing the class-specific information of itsclass), the images belonging to the same class tend to be drawn from the same


subspace, while the images from different classes are drawn from different sub-spaces. Thus, the image samples of the same class are usually highly correlated,whereas the image samples from different classes are independent. Therefore, theclass-specific sub-dictionary Ai, which captures the class-specific information intraining samples from the i-th class i ¼ 1; 2; . . .;Cð Þ, is expected to be low rank.

– The class-shared information represent the common patterns shared by all classes.Since the common features of different classes often have coherences or even sharesame atoms, the class-shared components of different classes are highly linearlycorrelated. Therefore, the class-shared dictionary B, which captures the class-sharedinformation of all training samples, should reasonably be a low rank matrix.

– Because the sparse noise often contaminate a relatively small portion of the wholeimage and statistically uncorrelated between different images [6], the error term ofeach class Ei should be a sparse matrix.

Based on above analyses and sparse representation theory, we propose the fol-lowing model for our low-rank double dictionary learning:

minAi;Xi;B;Zi;Ei

Aik k� þ Bk k� þ a Xik k1 þ Zik k1� �þ b Eik k1

s:t:Y i ¼ AXi þBZi þEi for i ¼ 1; 2; . . .;C;ð2Þ

where α and β are positive-valued parameters that balance the sparsity of the codingcoefficient matrices (i.e., Xi and Zi) and the weight of the error term, respectively. �k k�is the nuclear norm of a matrix, i.e., the sum of the singular values of this matrix. Aik k�and Bk k� enforce each class-specific sub-dictionary Ai and the class-shared dictionaryB to be low-rank. �k k1 is the l1-norm of a matrix, and the l1-norm regularization Eik k1is used to promote the sparseness of the error term Ei.

To further enhance the discrimination of the learned dictionaries, firstly, since Ai isthe sub-dictionary associated with the i-th class, it is supposed to well represent theclass-specific component of the i-th class. Therefore, by rewriting the coefficient matrixXi as Xi ¼ Xi1;Xi2; . . .;XiC½ �, where Xij denotes the coefficients of Y i corresponding toAj, we can have the constraint Y i ¼ AiXii þBZi þEi. Secondly, the class-specificcomponents of different classes should be incoherent, therefore for Y j, the coefficients

Xji i 6¼ jð Þ are expected to be zero matrices. To this end, an incoherence term R Aið Þ ¼PC

j¼1;j 6¼i AiXji

�� 2F i ¼ 1; 2; . . .;Cð Þ is introduced into the objective function of the

proposed model. The minimization of such term makes the correlation between �Y j andAi i 6¼ jð Þ as small as possible. Considering the both two factors, the objective functionof Eq. (2) is further improved for LRD2L as follow

minAi;Xi;B;Zi;Ei

Aik k� þ Bk k� þ a Xik k1 þ Zik k1� �þ b Eik k1 þ kR Aið Þ

s:t:Y i ¼ AXi þBZi þEi; Y i ¼ AiXii þBZi þEi;ð3Þ

where k[ 0 is a parameter that controls the contribution of the incoherence term.Equation (3) is the overall objective function of our LRD2L model, and we will presentthe optimization algorithm of solving this problem in Sect. 3.

320 Y. Rong et al.

3 Optimization Procedure

In this section, to solve the optimization problem in (3), we propose an effectiveoptimization algorithm, by dividing the problem (3) into three sub-problems and solvethem iteratively.

1. Updating the coding coefficients Xi and Zi class-by-class by fixing the dictionariesA;B and other coefficients Xj and Zj i 6¼ jð Þ.

2. Updating the class-specific sub-dictionary Ai class-by-class by fixing other vari-ables. The corresponding coefficients Xii should also be updated to meet the con-straint Y i ¼ AiXii þBZi þEi.

3. Updating the class-shared dictionary B and the corresponding coefficients Z itera-tively to satisfy the constraint Y i ¼ AXi þBZi þEi.

Similar to [14], the error term Ei is updated in each sub-problem, and because theweights of the error Ei should be adjusted differently for the constraints Y i ¼AiXii þBZi þEi and Y i ¼ AXi þBZi þEi, the parameter β in the second sub-problemis set differently with the other two sub-problems.

3.1 Updating Coding Coefficients Xi and Zi

Suppose that the dictionaries A and B are given, the coding coefficients Xi and Zi areupdated class by class. When calculating Xi and Zi, all other coefficients Xj and Zj

i 6¼ jð Þ are fixed. Thus, ignore the terms that unrelated with Xi and Zi, the problem (3)is reduced to a sparse coding problem, which can be formulated as follow:

minXi;Zi;Ei

a Xik k1 þ Zik k1� �þ b1 Eik k1 s:t: Y i ¼ AXi þBZi þEi: ð4Þ

Note that Xi;Zi½ �k k1¼ Xik k1 þ Zik k1 and the constraint can be rewritten asY i ¼ A;B½ � Xi;Zi½ � þEi. Therefore, by defining Pi ¼ Xi;Zi½ �; D ¼ A;B½ �; and intro-ducing an auxiliary variable H, the problem (4) is converted to the equivalent problem:

minPi;Ei;H

a Hk k1 þ b1 Eik k1 s:t: Y i ¼ DPi þEi; Pi ¼ H: ð5Þ

The above problem can be solved efficiently by the Augmented Lagrange Multipliers(ALM) [16] method, which minimizes the corresponding augmented Lagrange func-tion of problem (5) as

minPi;Ei;H

a Hk k1 þ b1 Eik k1 þ T1;Y i � DPi � Eih iþ T2;Pi �Hh i

þ l2

Y i � DPi � Eik k2F þ Pi �Hk k2F� � ð6Þ

where T1 and T2 are Lagrange multipliers and l[ 0 is a positive penalty parameter.A;Bh i ¼ TrðATBÞ is the sum of the diagonal elements of the matrix ATB, and AT is thetranspose of the matrix A.


3.2 Updating Class-Specific Sub-dictionary Ai

With the learned coefficients Xi, the sub-dictionary Ai is updated class by class, and thecorresponding coefficients Xii is also updated to meet the constraint Y i ¼ AiXii þBZi þEi. Then, the second sub-problem is converted to the following problem:

minAi;Xii;Ei

Aik k� þ a Xiik k1 þ b2 Eik k1 þ kRðAiÞs:t:Y i ¼ AiXii þBZi þEi:

ð7Þ

For mathematical brevity, we define YA ¼ Y i � BZi. Two auxiliary variables J andS are introduced and the problem (7) becomes to the equivalent optimization problem:

minAi;J;Xii;S;Ei

Jk k� þ a Sk k1 þ b2 Eik k1 þ kRðAiÞs:t:YA ¼ AiXii þEi; Ai ¼ J; Xii ¼ S:

ð8Þ

Problem (8) can be solved by solving the following Augmented Lagrange Multiplierproblem through the ALM method:

minAi;J;Xii;S;Ei

Jk k� þ a Sk k1 þ b2 Eik k1 þ kRðAiÞþ T1;YA � AiXii � Eih iþ T2;Ai � Jh iþ T3;Xii � Sh iþ l

2 YA � AiXii � Eik k2F þ Ai � Jk k2F þ Xii � Sk k2F� �

;

ð9Þ

where T1;T2 and T3 are Lagrange multipliers and l[ 0 is a positive penaltyparameter. Same as the existing dictionary learning methods [11, 12, 14], we alsorequire that the column of the learned dictionary has a unit length, thus we add thenormalization operation after updating J and Ai.

3.3 Updating Class-Shared Dictionary B

When all the class-specific sub-dictionaries are updated, we attempt to learn theclass-shared dictionary B with all the other variables fixed. Hence, the objectivefunction in (3) is reduced to

minB;Zi;Ei

Bk k� þ a Zik k1 þ b1 Eik k1 s:t:Y i ¼ AXi þBZi þEi: ð10Þ

Different from the class-specific sub-dictionary Ai that associated with only one class,the class-shared dictionary B is related to all the classes. Therefore, when updating B,we need to consider all the relationships corresponding to different classes together.Therefore, by summing up the objective function in Eq. (10) of all the classes, theoptimization problem (10) can be converted to the following optimization problem:

322 Y. Rong et al.

minB; Z1;Z2;...;ZC½ �;½E1;E2;...;EC �

Bk k� þ a ½Z1;Z2; . . .;ZC�k k1 þ b1 ½E1;E2; . . .;EC�k k1s:t: ½Y1;Y2; . . .;YC� ¼ A½X1;X2; . . .;XC� þB½Z1;Z2; . . .;ZC� þ ½E1;E2; . . .;EC�:

ð11Þ

Since X ¼ ½X1;X2; . . .;XC�; Z ¼ ½Z1;Z2; . . .;ZC� and E ¼ ½E1;E2; . . .;EC�, theEq. (11) can be reformulated as

minB;Z;E

Bk k� þ a Zk k1 þ b1 Ek k1 s:t: Y ¼ AXþBZþE: ð12Þ

We can observe that the objective function of problem (12) has the same form with theproblem (7), except the incoherence term RðAiÞ in Eq. (7). Therefore, by setting theparameter λ to zero, the problem (12) can also be solved by using the same optimizationstrategy as problem (7), through the ALM method.

So far, the algorithms for the three sub-problems have been presented. The opti-mization procedures of (4), (7) and (12) need to iterate a few times to get the solution ofLRD2L. The complete algorithm of LRD2L is summarized in Algorithm 1.

4 Experimental Results

In this section, we evaluate the effectiveness and the robustness of the proposed LRD2Lmethod on three public available datasets, under several experimental settings. Theperformance of our approach is compared with six state-of-the-art works, includingsparse representation classifier (SRC) [6], fisher discrimination dictionary learning(FDDL) [11], label consistent KSVD [9] version 1 (LC-KSVD1) and version 2(LC-KSVD2), DL-COPAR [12] and discriminative low-rank dictionary for sparserepresentation (DLRD_SR) [14]. Same as [17, 18], the image samples used in thispaper are normalized to a unit length. (i.e., the ‘2-norm of each image vector equals toone). There are four parameters a; k; b1 and b2 needed to be tuned in LRD2L, how-ever, we find that the changes on α and λ would not affect the classification results verymuch. Therefore, we set a ¼ 0:1; k ¼ 1 for all the experiments in this paper. b1; b2


and the parameters in the competing algorithms are tuned manually for each experi-ment setting to get the best classification performance.

4.1 Face Recognition with Pixel Corruptions

The Extended YaleB [19] dataset consists of 2414 near frontal face images from 38individuals, with each individual having around 59–64 images. The images are cap-tured under various laboratory-controlled illumination conditions. For each subject, werandomly select 30 face images to compose the training samples set, and the remainingimages are used as testing samples. All the images are manually cropped and nor-malized to the size of 32 × 32 pixels.

To evaluate the robustness to image corruption of different methods, a certainpercentage of pixels (from 0% to 40%) randomly selected from each training andtesting image are replaced with noise uniformly distributed over ½0;Vmax�, where Vmax isthe maximal pixel value in the image. For our approach, the parameters are set asb1 ¼ 0:08; b2 ¼ 0:055. The number of atoms of each class-specific sub-dictionary andthe class-shared dictionary are set to 15 and 200, respectively. For fair comparison, thedictionary sizes of the competing methods are set to 760 (20 atoms for each class), wealso set the size of the common feature dictionary in DL-COPAR to 20. SRC uses allthe training samples as dictionary. All experiments are repeated 5 times and the averageclassification accuracies are reported in Table 1.

From the Table 1, it can be seen that LRD2L consistently obtains the highestclassification accuracies under different pixel corruption conditions. With the per-centage of pixel corruption increases, the performances of other competing methods areremarkably degraded, whereas the classification accuracies of LRD2L decrease muchmore slowly. For example, when the percentage of pixel corruption increases from 0%to 20%, the noises have little influence on the performance of LRD2L (decreasing from99.43% to 96.88%). However, the classification accuracies of the other methods suffersa dramatically decrease, in the worst case, the performance of FDDL drops from97.93% to 61.93%. This demonstrates that capturing the low-rankness property of

Table 1. Classification results of different methods on Extended YaleB dataset.

Methods Classification accuracy (%) with differentpercentage of noises0% 10% 20% 30% 40%

SRC 97.80% 80.46% 67.19% 50.94% 36.19%FDDL 97.93% 76.22% 61.93% 44.51% 27.94%LC-KSVD1 96.22% 83.64% 70.86% 54.87% 36.34%LC-KSVD2 96.83% 83.67% 70.75% 55.65% 36.86%DL-COPAR 97.25% 85.71% 73.63% 57.61% 40.74%DLRD_SR 98.12% 92.07% 85.71% 74.41% 54.16%Proposed LRD2L 99.43% 98.50% 96.88% 86.68% 71.65%

324 Y. Rong et al.

dictionaries and introducing an error term, to model the noises in the images, make ourapproach robust to the pixel corruptions existing in the images.

4.2 Face Recognition with Occlusion

The AR dataset [20] contains more than 4000 frontal view face images of 126 subjects.For each subject, there are 26 images captured in two different sessions with atwo-week interval, under different illumination conditions, expression changes andparticular disguises. To evaluate the effectiveness of different methods in dealing withthe occlusions both in training and testing samples, following the experimental settingsin [21, 22], we conduct the experiments under three different scenarios:

– Sunglasses: For each subject, all seven undisguised images and one randomlyselected image with sunglasses from session 1 are used to construct the training set.The testing set consists of seven undisguised images from session 2 and allremaining images with sunglasses. Thus, there are 8 training samples and 12 testingsamples for each individual. The sunglasses cover about 20% of the face image.

– Scarf: Replace images with sunglasses in the above scenario by images with scarf.The scarf cover about 40% of the face image.

– Mixed: For each subject, all seven undisguised images, one randomly selectedimage with sunglasses and one randomly selected image with scarf from session 1are chosen as training samples. Remaining images are used to compose the testingset. Thus, there are also 9 training samples and 17 testing samples for each subject.

In all above experiments, the parameters of our approach is fixed asb1 ¼ 0:2; b2 ¼ 0:01. The dictionary sizes of the class-specific sub-dictionary and theclass-shared dictionary are set to 5 and 200, respectively. All the experiments repeat 5times. Table 2 reports the mean classification results of different approaches underthree scenarios.

Table 2. Classification results of different methods on AR dataset.

Methods Classification accuracy (%)under different scenariosSunglass Scarf Mixed

SRC 89.83% 89.17% 88.59%FDDL 90.08% 89.58% 89.47%LC-KSVD1 87.75% 86.53% 86.24%LC-KSVD2 89.36% 88.81% 87.76%DL-COPAR 90.08% 89.17% 88.24%DLRD_SR 94.08% 92.58% 91.53%Proposed LRD2L 95.28% 94.23% 93.59%


Again, our approach obtains the best classification results. More specifically,LRD2L outperforms DLRD_SR by 1.20% for sunglasses scenario, 1.65% for scarfscenario and 2.06% for mixed scenario. Compared with other competing methods,LRD2L achieves at least 5.20%, 4.65% and 4.12% improvements for three scenariosrespectively. This suggests that the low-rank regularizations on the dictionariesenhance the robustness of LRD2L to disguise occlusions, and compared to DLRD_SR,learning the class-shared dictionary makes our approach more discriminative.

4.3 Object Recognition with Pose Variants

The COIL-20 object dataset [23] consists of 1440 grayscale images from 20 objectswith various poses. Each object contains 72 images, which are captured in equallyspaced views, i.e., for every 5 degree, one image are taken. In this experiment, same as[24, 25], 10 images per class are randomly selected as training samples, while theremaining images compose the testing sample set. All the images are manually croppedand resized to 32 × 32 pixels. For our approach, the parameters are set asb1 ¼ 0:07; b2 ¼ 0:04. The size of each class-specific sub-dictionary and theclass-shared dictionary are set to 6 and 80, respectively. For the competing methods,the size of each sub-dictionary is set to 10. For DL-COPAR, the common pattern poolsize is set to 10. The experiment is repeated 5 times for each method, and the averageclassification accuracies of different approaches are reported in Table 3.

As shown in Table 3, on COIL-20 object dataset, DLRD_SR does not work welland only achieves 89.63% classification rate, which is lower than DL-COPAR andLC-KSVD. This maybe because that, without sufficient training samples, the posevariants in data makes the low-rankness property of each class unobvious. And due tothe low-rank regularization on each sub-dictionary, a portion of class-shared infor-mation are removed with sparse noises. By learning the class-shared dictionary tocapture the pose variants, LRD2L achieves the highest classification accuracy of92.05%, which outperforms competing methods with the gains over 1.18%.

Table 3. Classification results of different methods on COIL-20 dataset.

Methods Classification accuracy Methods Classification accuracy

SRC 87.50% DL-COPAR 90.42%FDDL 87.98% DLRD_SR 89.63%LC-KSVD1 90.68% Proposed LRD2L 92.05%LC-KSVD2 90.87%

326 Y. Rong et al.

5 Conclusion

In this paper, we propose a novel dictionary learning model, namely low-rank doubledictionary learning (LRD2L), to learn robust and discriminative dictionary from cor-rupted data, for robust image classification task. Unlike traditional dictionary learningmethods, besides learning a low-rank class-specific sub-dictionary for each class, ourmethod also learns a low-rank class-shared dictionary, to represent the common fea-tures shared by all classes, and a sparse error term to approximate the sparse noises indata. By separating the class-specific information from the other information, theclass-specific sub-dictionary only captures the most discriminative features of eachclass, which makes the proposed LRD2L model more robust and discriminative. Bycapturing the common features, which are essential to the representation of imagesamples, through the class-shared dictionary, the reconstructive capability of LRD2L isenhanced. Experimental results on three public datasets reveal the effectiveness and thesuperiority of LRD2L compared to the state-of-the-art methods.

References

1. Yang, M., Van Gool, L., Kong, H.: Sparse variation dictionary learning for face recognitionwith a single training sample per person. In: ICCV (2013)

2. Yang, M., Zhang, L., Yang, J., Zhang, D.: Metaface learning for sparse representation basedface recognition. In: ICIP (2010)

3. Li, S., Yin, H., Fang, L., Member, S.: Group-sparse representation with dictionary learningfor medical image denoising and fusion. IEEE Trans. Biomed. Eng. 59(12), 3450–3459(2012)

4. Rubinstein, R., Bruckstein, A.M., Elad, M.: Dictionaries for sparse representation modeling.Proc. IEEE 98(6), 1045–1057 (2010)

5. Wright, B.J., Mairal, J., Sapiro, G., Huang, T.S., Yan, S.: Sparse representation for computervision and pattern recognition. Proc. IEEE 98(6), 1031–1044 (2010)

6. Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S.: Robust face recognition via sparserepresentation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)

7. Aharon, M., Elad, M., Bruckstein, A.: K-SVD: an algorithm for designing overcompletedictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322(2006)

8. Zhang, Q., Li, B.: Discriminative K-SVD for dictionary learning in face recognition. In:CVPR (2010)

9. Jiang, Z., Lin, Z., Davis, L.S.: Label consistent K-SVD: learning a discriminative dictionaryfor recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2651–2664 (2013)

10. Ramirez, I., Sprechmann, P., Sapiro, G.: Classification and clustering via dictionary learningwith structured incoherence and shared features. In: CVPR (2010)

11. Yang, M., Zhang, D., Feng, X.: Fisher discrimination dictionary learning for sparserepresentation. In: ICCV (2011)

12. Kong, S., Wang, D.: A dictionary learning approach for classification: separating theparticularity and the commonality. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y.,Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 186–199. Springer, Heidelberg (2012).doi:10.1007/978-3-642-33718-5_14


http://dx.doi.org/10.1007/978-3-642-33718-5_14

13. Gu, S., Zhang, L., Zuo, W., Feng, X.: Projective dictionary pair learning for patternclassification. In: NIPS (2014)

14. Ma, L., Wang, C., Xiao, B., Zhou, W.: Sparse representation for face recognition based ondiscriminative low-rank dictionary learning. In: CVPR (2012)

15. Li, L., Li, S., Fu, Y.: Learning low-rank and discriminative dictionary for imageclassification. Image Vis. Comput. 32(10), 814–823 (2014)

16. Lin, Z., Liu, R., Su, Z.: Linearized alternating direction method with adaptive penalty forlow-rank representation. In: NIPS (2011)

17. Zhuang, L., Gao, S., Tang, J., Wang, J.: Constructing a nonnegative low-rank and sparsegraph with data-adaptive features. IEEE Trans. Image Process. 24(11), 3717–3728 (2015)

18. Li, S., Fu, Y.: Low-rank coding with b-matching constraint for semi-supervised classifi-cation. In: IJCAI (2013)

19. Georghiades, A.S., Belhumeur, P.N., Kriegman, D.J.: From few to many: illumination conemodels for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal.Mach. Intell. 23(6), 643–660 (2001)

20. Martinez, A.M.: The AR face database. CVC Technical report (1998)21. Chen, C.F., Wei, C.P., Wang, Y.C.F.: Low-rank matrix recovery with structural incoherence

for robust face recognition. In: CVPR (2012)22. Zhang, Y., Jiang, Z., Davis, L.S., Park, C.: Learning structured low-rank representations for

image classification. In: CVPR (2013)23. Nene, S.A., Nayar, S.K., Murase, H.: Columbia Object Image Library (COIL-20). Technical

report No. CUCS-006-96 (1996)24. Wang, S., Fu, Y.: Locality-constrained discriminative learning and coding. In: CVPR

Workshops (2015)25. Li, S., Fu, Y.: Learning robust and discriminative subspace with low-rank constraints. IEEE

Trans. Neural Netw. Learn. Syst. 27, 2160–2173 (2015)

328 Y. Rong et al.

Documents

Robust Image Classiﬁcation via Low-Rank Double Dictionary