15
Research Article Adaptive Deep Supervised Autoencoder Based Image Reconstruction for Face Recognition Rongbing Huang, 1,2 Chang Liu, 1 Guoqi Li, 3 and Jiliu Zhou 2 1 Key Laboratory of Pattern Recognition and Intelligent Information Processing, Institutions of Higher Education of Sichuan Province, Chengdu University, Chengdu, Sichuan 610106, China 2 School of Computer and Soſtware, Sichuan University, Chengdu, Sichuan 610065, China 3 School of Reliability and System Engineering, Beihang University, Beijing 100191, China Correspondence should be addressed to Rongbing Huang; [email protected] Received 3 June 2016; Revised 30 July 2016; Accepted 28 September 2016 Academic Editor: Simone Bianco Copyright © 2016 Rongbing Huang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Based on a special type of denoising autoencoder (DAE) and image reconstruction, we present a novel supervised deep learning framework for face recognition (FR). Unlike existing deep autoencoder which is unsupervised face recognition method, the proposed method takes class label information from training samples into account in the deep learning procedure and can automatically discover the underlying nonlinear manifold structures. Specifically, we define an Adaptive Deep Supervised Network Template (ADSNT) with the supervised autoencoder which is trained to extract characteristic features from corrupted/clean facial images and reconstruct the corresponding similar facial images. e reconstruction is realized by a so-called “bottleneck” neural network that learns to map face images into a low-dimensional vector and reconstruct the respective corresponding face images from the mapping vectors. Having trained the ADSNT, a new face image can then be recognized by comparing its reconstruction image with individual gallery images, respectively. Extensive experiments on three databases including AR, PubFig, and Extended Yale B demonstrate that the proposed method can significantly improve the accuracy of face recognition under enormous illumination, pose change, and a fraction of occlusion. 1. Introduction Over the last couple of decades, face recognition has gained a great deal of attention in the academic and industrial commu- nities on account of its challenging essence and its widespread applications. e study of face recognition has a great theoret- ical value, which involves image processing, artificial intel- ligence, machine learning, computer vision, and so on, and it also has a high correlation with other biometrics like fingerprints, speech recognition, and iris scans. In the field of pattern recognition, as a classic problem, face recognition mainly covers two issues, feature exaction and classifier design. Currently, most existing works are focusing on these two aspects to promote the performance of face recognition system. In most real-world applications, it is actually a multiclass classification issue for face recognition. ere are many classi- fication methods proposed by researchers. Among them, nearest neighbor classifier (NNC) and its variants like near- est subspace [1] are the most popular methods in pattern classification [2]. In [3], the problem of face recognition was transformed to a binary classification problem through con- structing intra- and interfacial image spaces. e intraspace stands for the difference of the same person and the inter- space denotes the difference of different people. en, many binary classifiers such as Support Vector Machine (SVM) [4], Bayesian, and Adaboost [5] can be used. Besides the classifier design, the other important issue is feature representation. In the real word, face images are usually influenced by variances such as illuminations, posture, occlusions, and expressions. Additionally, there is fact that the difference from the same person would be much larger than that from different people. erefore, it is crucial to get efficient and discriminant features making the intraspace compact and expanding the margin among dif- ferent people. Until now, various feature extraction methods Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2016, Article ID 6795352, 14 pages http://dx.doi.org/10.1155/2016/6795352

Research Article Adaptive Deep Supervised Autoencoder ...Adaptive Deep Supervised Network Template (ADSNT). the deep network perform well, similar to [], we need to giveitinitializationweights.en,thepreinitializedADSNT

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article Adaptive Deep Supervised Autoencoder ...Adaptive Deep Supervised Network Template (ADSNT). the deep network perform well, similar to [], we need to giveitinitializationweights.en,thepreinitializedADSNT

Research ArticleAdaptive Deep Supervised Autoencoder Based ImageReconstruction for Face Recognition

Rongbing Huang12 Chang Liu1 Guoqi Li3 and Jiliu Zhou2

1Key Laboratory of Pattern Recognition and Intelligent Information Processing Institutions of Higher Education ofSichuan Province Chengdu University Chengdu Sichuan 610106 China2School of Computer and Software Sichuan University Chengdu Sichuan 610065 China3School of Reliability and System Engineering Beihang University Beijing 100191 China

Correspondence should be addressed to Rongbing Huang huangrb2006126com

Received 3 June 2016 Revised 30 July 2016 Accepted 28 September 2016

Academic Editor Simone Bianco

Copyright copy 2016 Rongbing Huang et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Based on a special type of denoising autoencoder (DAE) and image reconstruction we present a novel supervised deep learningframework for face recognition (FR) Unlike existing deep autoencoder which is unsupervised face recognition method theproposed method takes class label information from training samples into account in the deep learning procedure and canautomatically discover the underlying nonlinear manifold structures Specifically we define an Adaptive Deep Supervised NetworkTemplate (ADSNT) with the supervised autoencoder which is trained to extract characteristic features from corruptedcleanfacial images and reconstruct the corresponding similar facial images The reconstruction is realized by a so-called ldquobottleneckrdquoneural network that learns to map face images into a low-dimensional vector and reconstruct the respective correspondingface images from the mapping vectors Having trained the ADSNT a new face image can then be recognized by comparing itsreconstruction image with individual gallery images respectively Extensive experiments on three databases including AR PubFigand Extended Yale B demonstrate that the proposed method can significantly improve the accuracy of face recognition underenormous illumination pose change and a fraction of occlusion

1 Introduction

Over the last couple of decades face recognition has gained agreat deal of attention in the academic and industrial commu-nities on account of its challenging essence and its widespreadapplicationsThe study of face recognition has a great theoret-ical value which involves image processing artificial intel-ligence machine learning computer vision and so on andit also has a high correlation with other biometrics likefingerprints speech recognition and iris scans In the fieldof pattern recognition as a classic problem face recognitionmainly covers two issues feature exaction and classifierdesign Currently most existing works are focusing on thesetwo aspects to promote the performance of face recognitionsystem

In most real-world applications it is actually a multiclassclassification issue for face recognitionThere aremany classi-fication methods proposed by researchers Among them

nearest neighbor classifier (NNC) and its variants like near-est subspace [1] are the most popular methods in patternclassification [2] In [3] the problem of face recognition wastransformed to a binary classification problem through con-structing intra- and interfacial image spaces The intraspacestands for the difference of the same person and the inter-space denotes the difference of different people Then manybinary classifiers such as Support VectorMachine (SVM) [4]Bayesian and Adaboost [5] can be used

Besides the classifier design the other important issueis feature representation In the real word face imagesare usually influenced by variances such as illuminationsposture occlusions and expressions Additionally there isfact that the difference from the same person would bemuch larger than that from different people Therefore it iscrucial to get efficient and discriminant features making theintraspace compact and expanding the margin among dif-ferent people Until now various feature extraction methods

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2016 Article ID 6795352 14 pageshttpdxdoiorg10115520166795352

2 Mathematical Problems in Engineering

x

x

x

x minus x2

middot middot middot

middot middot middot

middot middot middot

middot middot middot

(a)

1024

2000

1256

250

1256

2000

1024Decoder

Encoder

(b)

Figure 1 Network architectures (a) DAE and (b) SDAE

have been explored including classical subspace-baseddimension reduction approaches like principal componentanalysis (PCA) fisher linear discriminant analysis (FLDA)independent component analysis (ICA) and so on [6] Inaddition there are some local appearance features extractionmethods like Gabor wavelet transform local binary patterns(LBP) and their variants [7] which are stable to localfacial variations such as expressions occlusions and posesCurrently deep learning including deep neural network hasshown its great success on image expression [8 9] and theirbasic idea is to train a nonlinear feature extractor in each layer[10 11] After greedy layer-wise training of a deep networkarchitecture the output of the network is applied as imagefeature for latter classification task Among deep networkarchitectures as a representative building block denoisingautoencoder (DAE) [12] learns features that is robust tonoise by a nonlinear deterministic mapping Image featuresderived from DAE have demonstrated good performance inmany aspects such as object detection and digit recognitionInspired by the great success of DAE based deep networkarchitecture a supervised autoencoder (SAE) [9] was alsoproposed to build the block which firstly treated the facialimages in some variants like illuminations expressions andposes as corrupted images by noises A face imagewithout thevariant through an SAE can be recovered meanwhile robustfeatures for image representation are also extracted

Taking as an example the great success of DAE and SAEbased deep learning and inspired by the face recognitionunder complex environment in this article we present anovel deep learning method based on SAE for face recogni-tion Unlike existing deep stacked autoencoder (AE) whichis an unsupervised feature learning approach our proposedmethod takes full advantage of the class label information oftraining samples in the deep learning procedure and tries todiscover the underlying nonlinear manifold structures in thedata

The rest of this paper is organized as follows In Section 2we give a brief review of DAE and the state-of-the-art facerecognition based on deep learning In Section 3 we focuson the proposed face recognition approachThe experimentalresults conducted on three public databases are given inSection 4 Finally we draw a conclusion in Section 5

2 Related Work

In this section we briefly review work related to DAE anddeep learning based face recognition system21Work Related toDAE DAE is a one-layer neural networkwhich is a recent variant of the conventional autoencoder(AE) It learns to try to recover the clean input data samplefrom its corrupted version The architecture of DAE is illus-trated in Figure 1(a) Let there be a total of 119896 training samplesand let 119909 denote the original input data In DAE firstlylet the input data 119909 be contaminated with some predefinednoise such as Gaussian white noise or Poisson noise to obtaincorrupted version such that is input into an encoder ℎ =119891() = 119906119891(119882+ 119887119891) Then an output of the encoder ℎ is usedas an input of a decoder = 119892(ℎ) = 119906119892(1198821015840ℎ+119887119892) Here119906119891 and119906119892 are the predefined activation functions such as sigmoidfunction hyperbolic tangent function or rectifier function[13] of encoder and decoder respectively 119882 isin 119877119889ℎtimes119889119909 and1198821015840 isin 119877119889119909times119889ℎ are the network parameters which denote theweights for the encoder and decoder respectively 119887119891 isin 119877119889ℎand 119887119892 isin 119877119889119909 refer to the bias terms 119889119909 and 119889ℎ presentdimensionality of the original data and the number of hiddenneurons respectively On the basis of the above definition aDAE learns by solving a regularized optimization problem asfollows

min1198821198821015840119887119891 119887119892

119896sum119894=1

119909 minus 22 + 1205822 (sum119895

1198822119865 +sum119897

100381710038171003817100381710038171198821015840100381710038171003817100381710038172119865) (1)

Mathematical Problems in Engineering 3

Facedataset

Test image

Preprocessfor examplehistogram

equalization

Preprocessfor examplehistogram

equalization

Train

Map

ADSNT

Galleryimages

Probeimages

Trainingset

120579ADSNT

Output label I of the test imageI = argmin forallg isin 1 2 c

g

100381710038171003817100381710038171003817x(t) minus xg

100381710038171003817100381710038171003817

Figure 2 Flowchart of the proposed ADSNT image reconstruction for face recognition

Here sdot 22 is the reconstruction error and sdot 119865 denotes theFrobenius norm and 120582 is a parameter that balances the recon-struction loss and weight penalty terms With reconstructingthe clean input data from a corrupted version of it a DAEcan exploremore robust features than a conventional AE onlysimply learning the identity mapping

To further promote learningmeaningful features sparsityconstraints [14] are utilized to impose on the hidden neuronswhen the number of hidden neurons is large which is definedin the light of the Kullback-Leibler (KL) divergence as119898sum119895

KL (120588 || 120588119895) = 119898sum119895=1

120588 log 120588120588119895 + (1 minus 120588) log( 1 minus 1205881 minus 120588119895) (2)

where 119898 is the number of neurons in one hidden layer 120588119895is determined by taking the average activation of a hiddenunit 119895 (over all the training set) and 120588 is a sparsity parameter(typically a small value)

After finishing119891 and 119892 learning the output from encoderℎ is input to the next layer Through training such DAElayerwise stacked denoising autoencoders (SDAE) are thenbuilt Its structure is illustrated in Figure 1(b)

In the real-word application like face recognition thefaces are usually influenced by all kinds of variances such asexpression illumination pose and occlusion To overcomethe effect of variances Gao et al [9] proposed supervisedautoencoder based on the principle of DAE They treatedthe training sample (gallery image) from each person withfrontaluniform illumination neural expression and withoutocclusion as clean data and test faces (probe images) accom-panied by variances (expression illumination occlusion etc)as corrupted data A mapping capturing the discriminantstructure of the facial images from different people is learnedwhile keeping robust to the variances in these faces Thenrobust feature is extracted for image presentation and theperformance of face recognition is greatly enhanced

22 Deep Learning Based Face Recognition System In theearly face recognition there have been various face represen-tation methods including hand-crafted or ldquoshallowrdquo learningways [6 7] In recent years with the development of bigdata and computer hardware feature learning based on deep

structure has been greatly successful in image representationfield [8 12 15 16] By means of deep structure learning theability of model representation gets great enhancement andwe can learn complicated (nonlinear) information from orig-inal data effectively In [16] deep Fisher networkwas designedthrough stacking all the Fisher vectors which greatly per-formed over conventional Fisher vector representation Chenet al [17] proposed marginalized SDAE to learn the opti-mal closed-form solution which reduced the computationalcomplexity and improved the scalability of high-dimensionaldescriptive features Taigman et al [18] presented a faceverification system based on Convolutional Neural Networks(CNNs) which also obtained high accuracy of verification onthe LFW dataset Zhu et al [19] designed a network structurethat is composed of facial identity-preserving layer andimage reconstruction layer which can reduce intravarianceand achieve discriminant information preservation In [20]Hayat et al proposed a deep learning framework basedon AE with application to image set classification and facerecognition which obtained the best performance comparingwith existing state-of-the-art methods Gao et al [9] furtherproposed an SAE which can be used to build the deeparchitecture and can extract the facial features that are robustto variants Sun et al [21] learned multiple convolutionalnetworks (ConvNets) from predicting 10000 subjects whichgeneralized well to face verification issue Furthermore theyimproved the ConvNets by incorporating identification andverificationmissions and enhanced recognition performance[22] Cai et al [23] stacked several sparse independentsubspace analyses (sISA) to construct deep network structureto learn identity representation

3 Proposed Method

This section presents our proposed approach whose blockdiagram is illustrated in Figure 2 Firstly inspired by stackedDAE and SAE [9] we define Adaptive Deep SupervisedNetwork Template (ADSNT) that can learn an underlyingnonlinear manifold structure from the facial images Thebasic architecture of ADSNT is illustrated in Figure 3(c) andthe corresponding details are depicted in Section 31 Tomake

4 Mathematical Problems in Engineering

Similar

f1205791f1205791

g1205799984001

x x x

Similar

middot middot middot

middot middot middot middot middot middot middot middot middot

middot middot middot

(a)

Similar

f3f3

f2f2

f1f1g1

g2

g3

middot middot middot

middot middot middot middot middot middot middot middot middot

middot middot middotmiddot middot middot

middot middot middot middot middot middot middot middot middotx x x

x

Similar

(b)

middot middot middot middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

JH(x x)

x x

g1205799984001

g1205799984002

g1205799984003

f1205793

f1205792

f1205791

DC

EC

f1205793

f1205792

f1205791

x

Corrupted facex

Clean face

(c)

Figure 3 Architecture of SSAE and ADSNT (a) Supervised autoencoder (SAE) which is comprised of cleanldquocorruptedrdquo datum one hiddenlayer and one reconstruction layer by using the ldquocorrupted datumrdquo (b) stacked supervised autoencoder (SSAE) (c) architecture of theAdaptive Deep Supervised Network Template (ADSNT)

the deep network perform well similar to [20] we need togive it initialization weights Then the preinitialized ADSNTis trained to reconstruct the invariant faces which are insen-sitive to illumination pose and occlusion Finally havingtrained the ADSNT we use the nearest neighbor classifier torecognize a new face image by comparing its reconstructionimage with individual gallery images respectively

31 Adaptive Deep Supervised Network Template (ADSNT)As presented in Figure 3(c) our ADSNT is a deep supervisedautoencoder (DSAE) that consists of two parts an encoder(EC) and a decoder (DC) Each of them has three hiddenlayers and they share the third layer that is the centralhidden layer The features learned from the hidden layer andthe reconstructed clean face are obtained by using the ldquocor-ruptedrdquo data to train the SSAE In the process of pretrainingwe learn a stack of SAE each having only one hidden layerof feature detectors Then the learned activation features ofone SAE are used as ldquodatardquo for training the next SAE in thestack Such training is repeated a number of times until we getthe desired number of layers Although we use the basic SAEstructure which is shown in Figure 3(a) [9] to construct thestacked supervised autoencoder (SSAE) Gao et alrsquos stackedsupervised autoencoder only used two hidden layers and

one reconstruction layer In this paper we use three hiddenlayers to compose the encoder and decoder respectivelywhose structures are shown in Figures 3(b) and 3(c) Theencoder part tries best to seek a compact low-dimensionalmeaningful representation of the cleanldquocorruptedrdquo dataFollowing the work [20] the encoder can be formulated asa combination of several layers which are connected with anonlinear activation function 119906119891(sdot) We can use a sigmoidfunction or a rectified linear unit as nonlinear activation tomap the cleanldquocorruptedrdquo data 119909 to a representation ℎ asfollows

ℎ = 119891 (ℎ2) = 119906119891 (119882(3)119890 ℎ2 + 119887(3)119890 ) ℎ2 = 119891 (ℎ1) = 119906119891 (119882(2)119890 ℎ1 + 119887(2)119890 ) ℎ1 = 119891 (119909) = 119906119891 (119882(1)119890 119909 + 119887(1)119890 ) ℎ = 119891 (ℎ2) = 119906119891 (119882(3)119890 ℎ2 + 119887(3)119890 ) ℎ2 = 119891 (ℎ1) = 119906119891 (119882(2)119890 ℎ1 + 119887(2)119890 ) ℎ1 = 119891 () = 119906119891 (119882(1)119890 + 119887(1)119890 )

(3)

Mathematical Problems in Engineering 5

where119882(119894)119890 isin 119877119889119894minus1times119889119894 is a weight matrix of the encoder for the119894th layer with 119889119894 neurons and 119887(119894)119890 isin 119877119889119894 is the bias vector Theencoder parameters learning are achieved by jointly trainingthe encoder-decoder structure to reconstruct the ldquocorruptrdquodata by minimizing a cost function (see Section 32) There-fore the decoder can be defined as a combination of severallayers integrating a nonlinear activation function 119906119892(sdot)whichreconstructs the ldquocorruptrdquo data from the encoder output ℎThe reconstructed output of the decoder is given by

= 119892 (119909) = 119906119892 (119882(3)119889 119909 + 119887(3)119889 ) 119909 = 119892 (119909) = 119906119892 (119882(2)119889 119909 + 119887(2)119889 ) 119909 = 119892 (ℎ) = 119906119892 (119882(1)119889 ℎ + 119887(1)119889 )

(4)

So we can describe the complete ADSNT by its parameter120579ADSNT = 120579119882 120579119887 where 120579119882 = 119882(119894)119890 119882(119894)119889 and 120579119887 =119887(119894)119890 119887(119894)119889 119894 = 1 2 332 Formulation of Image Reconstruction Based on ADSNTNow we are ready to depict the reconstruction image basedon ADSNT The details are presented as follows

Given a set of 119896 classes training images that includegallery images (called clean data) and probe images (calledldquocorruptedrdquo data) and their corresponding class labels 119910119888 =[1 2 119896] the dataset will be used to train ADSNT forfeature learning Let 119909119894 denote a probe image and 119909119894 (119894 =1 2 119872) present gallery images corresponding to 119909119894 It isdesirable that119909119894 and119909119894 should be similarTherefore followingthe work [9 22] we obtain the following formulation

argmin120579ADSNT

119869 = 1119872 sum119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172

+ 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865) (5)

where 120579ADSNT = 120579119882 120579119887 (see Section 31) are the parametersof ADSNT which is fine-tuned by learning In this paper weonly explore the tied weights that is 119882(3)

119889= 119882(1)119879119890 119882(2)

119889=119882(2)119879119890 and 119882(1)

119889= 119882(3)119879119890 (see Figure 3(c)) 119909119894 is the recon-

struction image of the corrupted image119909119894 Like regularizationparameter 120582120579119882 balances the similarity of the same personto preserve 119891(119909119894) and 119891(119909119894) as similarly as possible 119891(sdot) is anonlinear activation function 120593 is a parameter that balancesweight penalty terms and reconstruction loss sdot 119865 presentsthe Frobenius norm and sum3119895 119882(119894)119890 2119865 + sum3119895 119882(119894)119889 2119865 ensuressmall weight values for all the hidden neurons Furthermorefollowing the work [9 14] we impose a sparsity constrainton the hidden layer to enhance learning meaningful features

Then we can further modify cost function and obtain thefollowing objection formulation

argmin120579ADSNT

119869reg= 119869 + 120574( 3sum

119894

KL (120588119909 || 1205880) + 5sum119894

KL (120588 || 1205880)) (6)

where

120588119909 = 1119872 sum119894

(12119891 (119909119894) + 1) 120588 = 1119872 sum

119894

12 (119891 (119909119894) + 1) KL (120588 || 1205880)

= sum119895

(1205880 log(1205880120588119895) + (1 minus 1205880) log(1 minus 12058801 minus 120588119895))

(7)

Here the KL divergence between two distributions that is 1205880and 120588119895 that present 120588119909 or 120588 is calculated The sparsity 1205880 isusually a constant (taking a small value according to thework[9 24] it is set to 005 in our experiments) whereas 120588119909 and 120588are the mapping mean activation values from clean data andcorrupted data respectively

33 Optimization of ADSNT For obtaining the optimizationparameter 120579ADSNT = 120579119882 120579119887 it is important to initializeweights and select an optimization training algorithm Thetrainingwill fail if the initializationweights are inappropriateThis is to say if we give network too large initializationweights the ADSNTwill be trapped in local minimum If theinitialized weights are too small the ADSNT will encounterthe vanishing gradient problem during backpropagationTherefore following the work [20 24] Gaussian RestrictedBoltzmann Machines (GRBMs) are adopted to initializeweight parameters by performing pretraining which hasbeen already applied widely For more details we referthe reader to the original paper [24] After obtaining theinitialized weights the limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimization algorithm is uti-lized to learn the parameters as it has better performance andfaster convergence than stochastic gradient descent (SGD)and conjugated gradient (CGD) [25] Algorithm 1 depicts theoptimization procedure of ADSNT

Algorithm 1 (learning adaptive deep supervised networktemplate)

Input Training images Ω 119896 classes and each class is com-posed of the face with neutral expression frontal poseand normal illumination condition (clean data) and randomnumber of variant faces (corrupted data) Number of networklayers 119871 Iterative number 119868 balancing parameters 120582 120593 and 120574and convergence error 120576Output Weight parameters 120579ADSNT = 120579119882 120579119887

6 Mathematical Problems in Engineering

(1) Preprocess all images namely perform histogramequalization

(2) 119883 Randomly select a small subset for each individualfromΩ

(3) Initialize Train GRBMs by using 119883 to initialize the120579ADSNT = 120579119882 120579119887(4) (Optimization by L-BFGS)

For 119903 = 1 2 119877 doCalculate 119869reg using (6)

If 119903 gt 1 and |119869119903 minus119869119903minus1| lt 120576 go to ReturnReturn 120579119882 and 120579119887

Since training the ADSNT model aims to reconstructclean data namely gallery images from corrupt data it mightlearn an underlying structure from the corrupt data andproduce very useful representation Furthermore we canlearn an overcomplete sparse representation from corruptdata through mapping them into a high-dimensional featurespace since the first hidden layer has the number of neuronslarger than the dimensionality of original data The high-dimensional model representation is then followed by a so-called ldquobottleneckrdquo that is the data is further mapped to anabstract compact and low-dimensional model representa-tion in the subsequent layers of the encoder Through sucha mapping the redundant information such as illuminationposes and partial occlusion in the corrupted faces is removedandonly the useful information content for us is kept In addi-tion we know that if we use AE with only one hidden layerand jointly linear activation functions the learned weightswould be analogous to a PCA subspace [20] However AEis an unsupervised algorithm In our work we make use ofthe class label information to train SAE so if we also useonly one hidden layer with a linear activation function thelearnedweights by the SAE are thought to be similar to ldquoLDArdquosubspace However in our structure we apply the nonlinearactivation functions and stack several hidden layers togetherand then theADSNTcan adapt to very complicated nonlinearmanifold structures Some of reconstructed images based onADSNT fromAR database are shown in Figure 4(b) One cansee that ADSNT can remove the illumination For those faceimages with partial occlusion ADSNT can also imitate theclean facesThis results are not surprising because the humanbeing has the capability of inferring the unknown faces fromknown face images via the experience (for deep networkstructure the experience learned derives from generic set)[9]34 Face Classification Based on ADSNT Image Reconstruc-tion To better train ADSNT all images need to be pre-processed It is a very important step for object recogni-tion including face recognition The common ways includehistogram equalization geometry normalization and imagesmoothing In this paper for the sake of simplicity we onlyperform histogram equalization on all the facial images tominimize illumination variations That is we utilize his-togram equalization to normalize the histogram of facial

images and make them more compact For the details abouthistogram equalization one can be referred to see [26]

After the ADSNT is trained completely with a certainnumber of individuals we can use it to performon the unseenface images for recognizing them

Given a test facial image 119909(119905) which is also preprocessedwith histogram equalization in the same way as the trainingimages and presented to the ADSNTnetwork we reconstruct(using (3) and (4)) image 119909(119905) from ADSNT which is similarto clean face For the sake of simplicity the nearest neighborclassification based on the Euclidean distance between thereconstruction and all the gallery images identifies the classThe classification formula is defined as

119868119896 (119909(119905)) = argmin119892

1003817100381710038171003817100381710038171003817119909(119905) minus 1199091198921003817100381710038171003817100381710038171003817 forall119892 isin 1 2 119888 (8)

where 119868119896(119909(119905)) is the resulting identity and119909119892 is the clean facialimage in the gallery images of individual 1198924 Experimental Results and Discussion

In this section extensive experiments are conducted topresent and compare the performance of different methodswith the proposed approach The experiments are imple-mented on three widely used face databases that is AR[27] Extended Yale B [28] and PubFig [29] The details ofthese three databases and performance evaluation of differentapproaches are presented as follows

41 Dataset Description TheARdatabase contains over 4000color face images from 126 people (56 women and 70 men)The images were taken in two sessions (between two weeks)and each session contained 13 pictures from one personThese images contain frontal view faces with different facialexpression illuminations and occlusions (sun glasses andscarf) Some sample face images from AR are illustratedin Figure 5(a) In our experiments for each person wechoose the facial imageswith neutral expression frontal poseand normal illumination condition as gallery images andrandomly select half the number of images from the rest ofthe images of each person as probe images The remainingimages compose the testing set

The Extended Yale B database consists of 16128 imagesof 38 people under 64 illumination conditions and 9 posesSome sample face images fromExtendedYale B are illustratedin Figure 5(b) For each person we select the faces that havenormal light condition and frontal pose as gallery images andrandomly choose 6 poses and 16 illumination face images tocompose the probe images The remaining images composethe testing set

The PubFig database is composed of 58797 images of200 subjects taken from the internet The images of thedatabase were taken in completely uncontrolled conditionswith noncooperative people These images have a very largedegree of variability in face expression pose illuminationand so forth Some sample images fromPubFig are illustratedin Figure 5(c) In our experiments for each individualwe select the faces with neutral expression the frontal or

Mathematical Problems in Engineering 7

(a)

(b)

Figure 4 Some original images fromARdatabase and the reconstructed ones (a) Original face images (corrupted faces) (b) Reconstructedfaces near frontal pose and normal illumination as galleries andrandomly choose half the number of images from the rest ofthe images of each person as probes The remaining imagescompose the testing set42 Experimental Settings In all the experiments the facialimages from the AR PubFig and Extended Yale B databasesare automatically detected using OpenCV face detector [30]After that we normalize the detected facial images (inorientation and scale) such that two eyes can be alignedat the same location Then the face areas are cropped andconverted to 256 gray levels images The size of each croppedimage is 26 times 30 pixels Thus the dimensionality of the inputvector is 780 Figure 6 presents an example fromAR databaseand the corresponding cropped image Each cropped facialimage is further preprocessed with histogram equalizationto minimize illumination variations We train our ADSNTmodel with 3 hidden layers where the number of hidden

nodes for these layers is empirically set as [1024 rarr 500 rarr120] because our experiments show that three hidden layerscan get a sufficiently good performance (see Section 433)

In order to show the whole experimental process aboutparameters setting we initially use the hyperbolic tangentfunction as the nonlinear activation function and implementADSNT on AR We also choose the face images with neutralexpression frontal pose and normal illumination as galleriesand randomly select half the number of images from the restof the images of each person as probe images The remainingimages compose the testing setThemean identification ratesare recorded

Firstly we empirically set the parameter 120576 = 0001 andsparsity target 1205880 = 005 and fix the parameters 120582 = 05and 120593 = 01 in ADSNT to check the effect of 120574 on theidentification rate As illustrated in Figure 6(a) where 120574 =008 ADSNT recognition method gets the best performance

8 Mathematical Problems in Engineering

(a)

(b)

(c)

Figure 5 A fraction of samples from AR PubFig and Extended Yale B face databases (a) AR (b) Extended Yale B and (c) PubFig

Then according to Figure 6(a) we fix the parameters 120574 = 008and 120582 = 05 in ADSNT to check the influence of 120593 Asshowed in Figure 6(b) when 120593 = 06 our method achievesthe best recognition rate At last we fix 120574 = 008 and 120593 = 06and the recognition rates are illustrated in Figure 6(c) withdifferent value of 120582 When 120582 = 3 the recognition rate is thehighest From the plot in Figure 6 one can observe that theparameters 120582 120593 and 120574 cannot be too large or too small If120582 is too large the ADSNT would be less discriminative ofdifferent subjects because it implements too strong similaritypreservation entry But if 120582 is too small it will degrade therecognition performance and the significance of similaritypreservation entry Similarly 120574 can also not be too large orthe hidden neurons will not be activated for a given input

and low recognition rate will be achieved If 120574 is too smallwe can get poor performance For the weight decay 120593 if itis too small the values of weights for all hidden units willchange very slightly On the contrary the values of weightswill change greatly

Using above those experiments we gain the optimalparameter values used in ADSNT as 120582 = 3 120593 = 06 and120574 = 008 on AR database The similar experiments also havebeen performed on Extended Yale B and PubFig databasesWe can get the parameters setting as 120582 = 26 120593 = 05 and120574 = 006 on Extended Yale B database and 120582 = 28 120593 = 052and 120574 = 009 on PubFig database

In the experiments we use two measures includingthe mean identification accuracy 120583 with standard deviation

Mathematical Problems in Engineering 9

Iden

tifica

tion

rate

()

120582 = 05120593 = 01

76

80

84

88

92

01

011

005

006

007

008

009

012

013

014

000

5

000

1

000

01

The parameter 120574

(a)

Iden

tifica

tion

rate

()

95

90

85

80

75

120582 = 05

1 2

08

07

010

02

03

04

05

06

09

22

004

002

008

000

9

000

6

The parameter 120593

120574 = 008

(b)

Iden

tifica

tion

rate

()

The parameter 120582

120593 = 06120574 = 008

80

85

90

55

50

45

40

35

3025

20

15

10

05

00

60

(c)

Figure 6 Parameters setting

](120583 plusmn ]) and the receiving operating characteristic (ROC)curves to validate the effectiveness of our method as well asother methods

43 Experimental Results and Analysis

431 Comparison with Different Methods In the followingexperiments on the three databases we compare the pro-posed approach with several recently proposed methodsThese compared methods include DAE with 10 randommask noises [12] marginalized DAE (MDAE) [17] Constrac-tive Autoencoders (CAE) [15] Deep Lambertian Networks(DLN) [31] stacked supervised autoencoder (SSAE) [9] ICA-Reconstruction (RICA) [32] and Template Deep Recon-struction Model (TDRM) [20] We use the implementationof these algorithms that are provided by the respectiveauthors For all the compared approaches we use the defaultparameters that are recommended in the correspondingpapers

The mean identification accuracy with standard devia-tions of different approaches on three databases is shownin Table 1 The ROC curves of different approaches areillustrated in Figure 7 The results imply that our approach

Table 1 Comparisons of the average identification accuracyand standard deviation () of different approaches on differentdatabases

Method AR Extended Yale B PubFigDAE [12] 5756 plusmn 02 6345 plusmn 13 6133 plusmn 15MDAE [17] 6780 plusmn 13 7156 plusmn 16 7055 plusmn 25CAE [15] 4950 plusmn 21 5572 plusmn 08 6856 plusmn 16DLN [31] NA 8150 plusmn 14 7760 plusmn 14SSAE [9] 8521 plusmn 07 8222 plusmn 03 8404 plusmn 12RICA [32] 7633 plusmn 17 7044 plusmn 13 7235 plusmn 15TDRM [20] 8770 plusmn 06 8642 plusmn 12 8990 plusmn 09Our method 9232 plusmn 07 9366 plusmn 04 9126 plusmn 16significantly outperforms other methods and gets the bestmean recognition rates for the same setting of trainingand testing sets Compared to those unsupervised deeplearning methods such as DAE MDAE CAE DLN andTDRM the improvement of our method is over 30 onExtended Yale B and AR databases where there is a littlepose variance On the PubFig database our approach canalso achieve the mean identification rate of 9126 plusmn 16

10 Mathematical Problems in Engineering

True

pos

itive

rate

04

05

06

07

08

09

10

02 03 04 05 06 07 08 09 1001

False positive rate

DAEMDAECAEDLN

SSAERICATDRMOur method

(a)

01 02 03 04 05 06 07 08 09 1000

False positive rate

07

08

09

10

True

pos

itive

rate

DAEMDAECAESSAE

RICATDRMOur method

(b)

050

055

060

065

070

075

080

085

090

095

100

True

pos

itive

rate

02 03 04 05 06 07 08 09 1001

False positive rate

DAEMDAECAEDLN

SSAERICATDRMOur method

(c)

Figure 7 Comparisons of ROC curves between our method and other methods on different databases (a) AR (b) Extended Yale B and (c)PubFig

and outperforms all compared methods The reason is thatourmethod can extract discriminative robust information tovariances (expression illumination pose etc) in the learneddeep networks Compared with a supervised method likeRICA the proposed method can improve over 16 19and 23 on AR PubFig and Extended Yale B databasesrespectively Our method is a deep learning method whichfocuses on the nonlinear classification problem with learninga nonlinear mapping such that more nonlinear discriminantinformation may be explored to enhance the identificationperformance Compared with SSAE method that is designedfor removing the variances such as illumination pose andpartial occlusion our method can still be better over 6

because of using the weight penalty terms GRBM to initializeweights and three layersrsquo similarity preservation term

432 Convergence Analysis In this subsection we evaluatedthe convergence of our ADSNT versus a different numberof iterations Figure 8 illustrates the value of the objectivefunction of ADSNT versus a different number of iterationson the AR PubFig and Extended Yale B databases FromFigure 8(a) one can observe that ADSNT converges in about55 28 and 70 iterations on the three databases respectively

We also implement the identification accuracy of ADSNTversus a different number of iterations on the AR PubFigand Extended Yale B databases Figure 8(b) plots the mean

Mathematical Problems in Engineering 11

Obj

ectiv

e fun

ctio

n va

lue

4

3

2

1

0

Iteration number0 30 60 90 120 150

ARExtended Yale BPubFig

(a)

ARExtended Yale BPubFig

Mea

n id

entifi

catio

n ra

te (

)

100

80

60

40

Iteration number20 40 60 80

(b)

Figure 8 Convergence analysis (a) Convergence curves of ADSNT on AR PubFig and Extended Yale B (b) Mean identification rate ()versus iterations of ADSNT on AR PubFig and Extended Yale B

Iden

tifica

tion

accu

racy

()

AR Extended Yale B PubFig0

10

20

30

40

50

60

70

80

90

100

Layer 1

Layer 2

Layer 3

Layer 4

Figure 9 The results of ADSNT with different network depth onthe different datasets

identification rate of ADSNT From Figure 8(b) one can alsoobserve that ADSNT achieves stable performance after about55 70 and 28 iterations on AR PubFig and Extended Yale Bdatabases respectively

433 The Effect of Network Depth In this subsection weconduct experiments on the three face datasets with differenthidden layer of our proposedADSNTnetworkThe proposedmethod achieves an identification rate of 923 plusmn 06 933 plusmn12 and 9122 plusmn 08 by three-hidden layer ADSNTnetwork that is 1024 rarr 500 rarr 120 respectively on ARExtended Yale B and PubFig datasets Figure 9 illustrates the

performance of different layer ADSNT One can observe thatthree-hidden layer network outperforms 2-layer networkand the result of 3-layer ADSNT network is very nearly equalto those of 4-layer network on AR and Extended Yale Bdatabases We also observe that the performance of 4-layernetwork is a bit lower than that of 3-layer network on thePubFig database In addition the deeper ADSNT networkis the more complex its computational complexity becomesTherefore the 3-layer network depth is a good trade-offbetween performance and computational complexity

434 Activation Function Following the work in [9] we alsoestimate the performance of ADSNTwith different activationfunctions such as sigmoid hyperbolic tangent and rectifiedlinear unit (ReLU) [33] which is defined as 119891(119909) = max(0 119909)When the sigmoid 119891(119909) = 1(1 + 119890minus119909) is used as activationfunction the objective function (see (6)) is rewritten asfollows

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

KL (120588119909 || 1205880) + 5sum119894

KL (120588 || 1205880))

(9)

where 120588119909 = (1119872)sum119894 119891(119909119894) 120588 = (1119872)sum119894 119891(119909119894)

12 Mathematical Problems in Engineering

Table 2 Comparisons of the ADSNT algorithm with differentactivation functions on the AR PubFig and Extended Yale Bdatabases

Dataset Sigmoid Tanh ReLUAR 8866 plusmn 14 9232 plusmn 07 9322 plusmn 15Extended Yale B 9055 plusmn 06 9366 plusmn 04 9454 plusmn 03PubFig 8740 plusmn 12 9126 plusmn 16 9244 plusmn 11

If ReLU is adopted as activation function (6) is formu-lated as

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171 + 5sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171)

(10)

Table 2 shows the performance of the proposed ADSNTbased on different activation functions conducted on thethree databases FromTable 2 one can see that ReLU achievesthe best performanceThe key reason is that we use theweightdecay term 120593 to optimize the objective function

435 Timing Consumption Analysis In this subsection weuse a HP Z620 workstation with Intel Xeon E5-2609 24GHzCPU 8G RAM and conduct a series of experiments onAR database to compare the time consumption of differentmethods which are tabulated in Table 3 The training time(seconds) is shown in Table 3(a) while the time (seconds)needed to recognize a face from the testing set is shownin Table 3(b) From Table 3 one can see that the proposedmethod requires comparatively more time for trainingbecause of initialization of ADSNT and performing imagereconstruction However the procedure of training is offlineWhen we identity an image from testing set our methodrequires less time than other methods

5 Conclusions

In this article we present an adaptive deep supervisedautoencoder based image reconstruction method for facerecognition Unlike conventional deep autoencoder basedface recognition method our method considers the classlabel information from training samples in the deep learn-ing procedure and can automatically discover the under-lying nonlinear manifold structures Specifically a multi-layer supervised adaptive network structure is presentedwhich is trained to extract characteristic features from cor-ruptedclean facial images and reconstruct the correspondingsimilar facial images The reconstruction is realized by a so-called ldquobottleneckrdquo neural network that learns to map face

Table 3 (a) Training time (seconds) for different methods (b)Testing time (seconds) for different methodsThe proposed methodcosts the least amount of testing time comparing with othermethods

(a)

Methods TimeDAE [12] 861MDAE [17] 1054CAE [15] 786DLN [31] 2343SSAE [9] 5351RICA [32] 1344TDRM [20] 1102Our method 12232

(b)

Methods TimeDAE [12] 027MDAE [17] 03CAE [15] 026DLN [31] 035SSAE [9] 022RICA [32] 019TDRM [20] 018Our method 013

images into a low-dimensional vector and to reconstructthe respective corresponding face images from the mappingvectors Having trained the ADSNT a new face image canthen be recognized by comparing its reconstruction imagewith individual gallery images during testing The proposedmethod has been evaluated on the widely used AR PubFigand Extended Yale B databases and the experimental resultshave shown its effectiveness For future work we are focusingon applying our proposed method to other application fieldssuch as pattern classification based on image set and actionrecognition based on the video to further demonstrate itsvalidity

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is partially supported by the research grant for theNatural Science Foundation from Sichuan Provincial Depart-ment of Education (Grant no 13ZB0336) and the NationalNatural Science Foundation of China (Grant no 61502059)

References

[1] J-T Chien and C-CWu ldquoDiscriminant waveletfaces and near-est feature classifiers for face recognitionrdquo IEEE Transactions on

Mathematical Problems in Engineering 13

Pattern Analysis and Machine Intelligence vol 24 no 12 pp1644ndash1649 2002

[2] Z Lei M Pietikainen and S Z Li ldquoLearning discriminant facedescriptorrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 36 no 2 pp 289ndash302 2014

[3] B Moghaddam T Jebara and A Pentland ldquoBayesian facerecognitionrdquo Pattern Recognition vol 33 no 11 pp 1771ndash17822000

[4] N Cristianini and J S Taylor An Introduction to SupportVector Machines and Other Kernel-based Learning MethodsCambridge University Press New York NY USA 2004

[5] R O Duda P E Hart and D G Stork Pattern ClassificationJohn Wiley amp Sons New York NY USA 2nd edition 2001

[6] W Zhao R Chellappa P J Phillips and A Rosenfeld ldquoFacerecognition a literature surveyrdquo ACM Computing Surveys vol35 no 4 pp 399ndash458 2003

[7] B Zhang S Shan X Chen and W Gao ldquoHistogram of Gaborphase patterns (HGPP) a novel object representation approachfor face recognitionrdquo IEEE Transactions on Image Processingvol 16 no 1 pp 57ndash68 2007

[8] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inNeural Information Processing Systems pp 1527ndash1554 2012

[9] S Gao Y Zhang K Jia J Lu and Y Zhang ldquoSingle sample facerecognition via learning deep supervised autoencodersrdquo IEEETransactions on Information Forensics and Security vol 10 no10 pp 2108ndash2118 2015

[10] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo American Associationfor the Advancement of Science Science vol 313 no 5786 pp504ndash507 2006

[11] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesmrdquo in Neural Networks Tricks ofthe Trade pp 437ndash478 Springer Berlin Germany 2012

[12] P Vincent H Larochelle I Lajoie Y Bengio and P-AManzagol ldquoStacked denoising autoencoders learning usefulrepresentations in a deep network with a local denoisingcriterionrdquo Journal of Machine Learning Research (JMLR) vol 11no 5 pp 3371ndash3408 2010

[13] V Nair and G E Hinton ldquoRectified linear units improveRestricted Boltzmann machinesrdquo in Proceedings of the 27thInternational Conference on Machine Learning (ICML rsquo10) pp807ndash814 Haifa Israel June 2010

[14] A Coates H Lee and A Y Ng ldquoAn analysis of single-layernetworks in unsupervised feature learningrdquo in Proceedings ofthe 14th International Conference on Artificial Intelligence andStatistics (AISTATS rsquo11) pp 215ndash223 Sardinia Italy 2010

[15] S Rifai P Vincent X Muller X Glorot and Y BengioldquoContractive auto-encoders explicit invariance during featureextractionrdquo in Proceedings of the 28th International Conferenceon Machine Learning (ICML rsquo11) pp 833ndash840 Bellevue WashUSA July 2011

[16] K Simonyan A Vedaldi and A Zisserman ldquoDeep fishernetworks for large-scale image classificationrdquo in Proceedings ofthe 27th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo13) pp 163ndash171 Lake Tahoe Nev USA Decem-ber 2013

[17] M Chen Z Xu K Q Weinberger and F Sha ldquoMarginalizeddenoising autoencoders for domain adaptationrdquo in Proceedingsof the 29th International Conference onMachine Learning (ICMLrsquo12) pp 767ndash774 Edinburgh UK July 2012

[18] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquo inProceedings of the 27th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 June 2014

[19] Z Zhu P Luo X Wang and X Tang ldquoDeep learning identity-preserving face spacerdquo in Proceedings of the 14th IEEE Interna-tional Conference on Computer Vision (ICCV rsquo13) pp 113ndash120Sydney Australia December 2013

[20] M Hayat M Bennamoun and S An ldquoDeep reconstructionmodels for image set classificationrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 37 no 4 pp 713ndash727 2015

[21] Y Sun X Wang and X Tang ldquoDeep learning face representa-tion from predicting 10000 classesrdquo in Proceedings of the 27thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo14) pp 1891ndash1898 Columbus Ohio USA June 2014

[22] Y Sun X Wang and X Tang ldquoDeep learning face rep-resentation by joint identification-verificationrdquo Tech Rephttpsarxivorgabs14064773

[23] X Cai CWang B Xiao X Chen and J Zhou ldquoDeep nonlinearmetric learning with independent subspace analysis for faceverificationrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 749ndash752 November2012

[24] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[25] Q V Le J Ngiam A Coates A Lahiri B Prochnow andA Y Ng ldquoOn optimization methods for deep learningrdquo inProceedings of the 28th International Conference on MachineLearning (ICML rsquo11) pp 265ndash272 Bellevue Wash USA July2011

[26] C Zhou X Wei Q Zhang and X Fang ldquoFisherrsquos lineardiscriminant (FLD) and support vectormachine (SVM) in non-negative matrix factorization (NMF) residual space for facerecognitionrdquoOptica Applicata vol 40 no 3 pp 693ndash704 2010

[27] A Martinez and R Benavente ldquoThe AR face databaserdquo CVCTech Rep 24 1998

[28] A S Georghiades P N Belhumeur and D J Kriegman ldquoFromfew to many illumination cone models for face recognitionunder variable lighting and poserdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 23 no 6 pp 643ndash6602001

[29] N Kumar A C Berg P N Belhumeur and S K NayarldquoAttribute and simile classifiers for face verificationrdquo in Proceed-ings of the 12th International Conference on Computer Vision(ICCV rsquo09) pp 365ndash372 Kyoto Japan October 2009

[30] M Rezaei and R Klette ldquoNovel adaptive eye detection andtracking for challenging lighting conditionsrdquo in ComputerVisionmdashACCV 2012 Workshops J-I Park and J Kim Edsvol 7729 of Lecture Notes in Computer Science pp 427ndash440Springer Berlin Germany 2013

[31] Y Tang R Salakhutdinov andGHHinton ldquoDeep Lambertiannetworksrdquo inProceedings of the 29th International Conference onMachine Learning (ICML 2012) pp 1623ndash1630 Edinburgh UKJuly 2012

[32] Q V Le A Karpenko J Ngiam and A Y Ng ldquoICA withreconstruction cost for efficient overcomplete feature learningrdquoin Proceedings of the 25th Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo11) pp 1017ndash1025 GranadaSpain December 2011

14 Mathematical Problems in Engineering

[33] X Glorot A Bordes and Y Bengio ldquoDeep sparse rectifier neu-ral networksrdquo in Proceedings of the 14th International Conferenceon Artificial Intelligence and Statistics pp 315ndash323 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article Adaptive Deep Supervised Autoencoder ...Adaptive Deep Supervised Network Template (ADSNT). the deep network perform well, similar to [], we need to giveitinitializationweights.en,thepreinitializedADSNT

2 Mathematical Problems in Engineering

x

x

x

x minus x2

middot middot middot

middot middot middot

middot middot middot

middot middot middot

(a)

1024

2000

1256

250

1256

2000

1024Decoder

Encoder

(b)

Figure 1 Network architectures (a) DAE and (b) SDAE

have been explored including classical subspace-baseddimension reduction approaches like principal componentanalysis (PCA) fisher linear discriminant analysis (FLDA)independent component analysis (ICA) and so on [6] Inaddition there are some local appearance features extractionmethods like Gabor wavelet transform local binary patterns(LBP) and their variants [7] which are stable to localfacial variations such as expressions occlusions and posesCurrently deep learning including deep neural network hasshown its great success on image expression [8 9] and theirbasic idea is to train a nonlinear feature extractor in each layer[10 11] After greedy layer-wise training of a deep networkarchitecture the output of the network is applied as imagefeature for latter classification task Among deep networkarchitectures as a representative building block denoisingautoencoder (DAE) [12] learns features that is robust tonoise by a nonlinear deterministic mapping Image featuresderived from DAE have demonstrated good performance inmany aspects such as object detection and digit recognitionInspired by the great success of DAE based deep networkarchitecture a supervised autoencoder (SAE) [9] was alsoproposed to build the block which firstly treated the facialimages in some variants like illuminations expressions andposes as corrupted images by noises A face imagewithout thevariant through an SAE can be recovered meanwhile robustfeatures for image representation are also extracted

Taking as an example the great success of DAE and SAEbased deep learning and inspired by the face recognitionunder complex environment in this article we present anovel deep learning method based on SAE for face recogni-tion Unlike existing deep stacked autoencoder (AE) whichis an unsupervised feature learning approach our proposedmethod takes full advantage of the class label information oftraining samples in the deep learning procedure and tries todiscover the underlying nonlinear manifold structures in thedata

The rest of this paper is organized as follows In Section 2we give a brief review of DAE and the state-of-the-art facerecognition based on deep learning In Section 3 we focuson the proposed face recognition approachThe experimentalresults conducted on three public databases are given inSection 4 Finally we draw a conclusion in Section 5

2 Related Work

In this section we briefly review work related to DAE anddeep learning based face recognition system21Work Related toDAE DAE is a one-layer neural networkwhich is a recent variant of the conventional autoencoder(AE) It learns to try to recover the clean input data samplefrom its corrupted version The architecture of DAE is illus-trated in Figure 1(a) Let there be a total of 119896 training samplesand let 119909 denote the original input data In DAE firstlylet the input data 119909 be contaminated with some predefinednoise such as Gaussian white noise or Poisson noise to obtaincorrupted version such that is input into an encoder ℎ =119891() = 119906119891(119882+ 119887119891) Then an output of the encoder ℎ is usedas an input of a decoder = 119892(ℎ) = 119906119892(1198821015840ℎ+119887119892) Here119906119891 and119906119892 are the predefined activation functions such as sigmoidfunction hyperbolic tangent function or rectifier function[13] of encoder and decoder respectively 119882 isin 119877119889ℎtimes119889119909 and1198821015840 isin 119877119889119909times119889ℎ are the network parameters which denote theweights for the encoder and decoder respectively 119887119891 isin 119877119889ℎand 119887119892 isin 119877119889119909 refer to the bias terms 119889119909 and 119889ℎ presentdimensionality of the original data and the number of hiddenneurons respectively On the basis of the above definition aDAE learns by solving a regularized optimization problem asfollows

min1198821198821015840119887119891 119887119892

119896sum119894=1

119909 minus 22 + 1205822 (sum119895

1198822119865 +sum119897

100381710038171003817100381710038171198821015840100381710038171003817100381710038172119865) (1)

Mathematical Problems in Engineering 3

Facedataset

Test image

Preprocessfor examplehistogram

equalization

Preprocessfor examplehistogram

equalization

Train

Map

ADSNT

Galleryimages

Probeimages

Trainingset

120579ADSNT

Output label I of the test imageI = argmin forallg isin 1 2 c

g

100381710038171003817100381710038171003817x(t) minus xg

100381710038171003817100381710038171003817

Figure 2 Flowchart of the proposed ADSNT image reconstruction for face recognition

Here sdot 22 is the reconstruction error and sdot 119865 denotes theFrobenius norm and 120582 is a parameter that balances the recon-struction loss and weight penalty terms With reconstructingthe clean input data from a corrupted version of it a DAEcan exploremore robust features than a conventional AE onlysimply learning the identity mapping

To further promote learningmeaningful features sparsityconstraints [14] are utilized to impose on the hidden neuronswhen the number of hidden neurons is large which is definedin the light of the Kullback-Leibler (KL) divergence as119898sum119895

KL (120588 || 120588119895) = 119898sum119895=1

120588 log 120588120588119895 + (1 minus 120588) log( 1 minus 1205881 minus 120588119895) (2)

where 119898 is the number of neurons in one hidden layer 120588119895is determined by taking the average activation of a hiddenunit 119895 (over all the training set) and 120588 is a sparsity parameter(typically a small value)

After finishing119891 and 119892 learning the output from encoderℎ is input to the next layer Through training such DAElayerwise stacked denoising autoencoders (SDAE) are thenbuilt Its structure is illustrated in Figure 1(b)

In the real-word application like face recognition thefaces are usually influenced by all kinds of variances such asexpression illumination pose and occlusion To overcomethe effect of variances Gao et al [9] proposed supervisedautoencoder based on the principle of DAE They treatedthe training sample (gallery image) from each person withfrontaluniform illumination neural expression and withoutocclusion as clean data and test faces (probe images) accom-panied by variances (expression illumination occlusion etc)as corrupted data A mapping capturing the discriminantstructure of the facial images from different people is learnedwhile keeping robust to the variances in these faces Thenrobust feature is extracted for image presentation and theperformance of face recognition is greatly enhanced

22 Deep Learning Based Face Recognition System In theearly face recognition there have been various face represen-tation methods including hand-crafted or ldquoshallowrdquo learningways [6 7] In recent years with the development of bigdata and computer hardware feature learning based on deep

structure has been greatly successful in image representationfield [8 12 15 16] By means of deep structure learning theability of model representation gets great enhancement andwe can learn complicated (nonlinear) information from orig-inal data effectively In [16] deep Fisher networkwas designedthrough stacking all the Fisher vectors which greatly per-formed over conventional Fisher vector representation Chenet al [17] proposed marginalized SDAE to learn the opti-mal closed-form solution which reduced the computationalcomplexity and improved the scalability of high-dimensionaldescriptive features Taigman et al [18] presented a faceverification system based on Convolutional Neural Networks(CNNs) which also obtained high accuracy of verification onthe LFW dataset Zhu et al [19] designed a network structurethat is composed of facial identity-preserving layer andimage reconstruction layer which can reduce intravarianceand achieve discriminant information preservation In [20]Hayat et al proposed a deep learning framework basedon AE with application to image set classification and facerecognition which obtained the best performance comparingwith existing state-of-the-art methods Gao et al [9] furtherproposed an SAE which can be used to build the deeparchitecture and can extract the facial features that are robustto variants Sun et al [21] learned multiple convolutionalnetworks (ConvNets) from predicting 10000 subjects whichgeneralized well to face verification issue Furthermore theyimproved the ConvNets by incorporating identification andverificationmissions and enhanced recognition performance[22] Cai et al [23] stacked several sparse independentsubspace analyses (sISA) to construct deep network structureto learn identity representation

3 Proposed Method

This section presents our proposed approach whose blockdiagram is illustrated in Figure 2 Firstly inspired by stackedDAE and SAE [9] we define Adaptive Deep SupervisedNetwork Template (ADSNT) that can learn an underlyingnonlinear manifold structure from the facial images Thebasic architecture of ADSNT is illustrated in Figure 3(c) andthe corresponding details are depicted in Section 31 Tomake

4 Mathematical Problems in Engineering

Similar

f1205791f1205791

g1205799984001

x x x

Similar

middot middot middot

middot middot middot middot middot middot middot middot middot

middot middot middot

(a)

Similar

f3f3

f2f2

f1f1g1

g2

g3

middot middot middot

middot middot middot middot middot middot middot middot middot

middot middot middotmiddot middot middot

middot middot middot middot middot middot middot middot middotx x x

x

Similar

(b)

middot middot middot middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

JH(x x)

x x

g1205799984001

g1205799984002

g1205799984003

f1205793

f1205792

f1205791

DC

EC

f1205793

f1205792

f1205791

x

Corrupted facex

Clean face

(c)

Figure 3 Architecture of SSAE and ADSNT (a) Supervised autoencoder (SAE) which is comprised of cleanldquocorruptedrdquo datum one hiddenlayer and one reconstruction layer by using the ldquocorrupted datumrdquo (b) stacked supervised autoencoder (SSAE) (c) architecture of theAdaptive Deep Supervised Network Template (ADSNT)

the deep network perform well similar to [20] we need togive it initialization weights Then the preinitialized ADSNTis trained to reconstruct the invariant faces which are insen-sitive to illumination pose and occlusion Finally havingtrained the ADSNT we use the nearest neighbor classifier torecognize a new face image by comparing its reconstructionimage with individual gallery images respectively

31 Adaptive Deep Supervised Network Template (ADSNT)As presented in Figure 3(c) our ADSNT is a deep supervisedautoencoder (DSAE) that consists of two parts an encoder(EC) and a decoder (DC) Each of them has three hiddenlayers and they share the third layer that is the centralhidden layer The features learned from the hidden layer andthe reconstructed clean face are obtained by using the ldquocor-ruptedrdquo data to train the SSAE In the process of pretrainingwe learn a stack of SAE each having only one hidden layerof feature detectors Then the learned activation features ofone SAE are used as ldquodatardquo for training the next SAE in thestack Such training is repeated a number of times until we getthe desired number of layers Although we use the basic SAEstructure which is shown in Figure 3(a) [9] to construct thestacked supervised autoencoder (SSAE) Gao et alrsquos stackedsupervised autoencoder only used two hidden layers and

one reconstruction layer In this paper we use three hiddenlayers to compose the encoder and decoder respectivelywhose structures are shown in Figures 3(b) and 3(c) Theencoder part tries best to seek a compact low-dimensionalmeaningful representation of the cleanldquocorruptedrdquo dataFollowing the work [20] the encoder can be formulated asa combination of several layers which are connected with anonlinear activation function 119906119891(sdot) We can use a sigmoidfunction or a rectified linear unit as nonlinear activation tomap the cleanldquocorruptedrdquo data 119909 to a representation ℎ asfollows

ℎ = 119891 (ℎ2) = 119906119891 (119882(3)119890 ℎ2 + 119887(3)119890 ) ℎ2 = 119891 (ℎ1) = 119906119891 (119882(2)119890 ℎ1 + 119887(2)119890 ) ℎ1 = 119891 (119909) = 119906119891 (119882(1)119890 119909 + 119887(1)119890 ) ℎ = 119891 (ℎ2) = 119906119891 (119882(3)119890 ℎ2 + 119887(3)119890 ) ℎ2 = 119891 (ℎ1) = 119906119891 (119882(2)119890 ℎ1 + 119887(2)119890 ) ℎ1 = 119891 () = 119906119891 (119882(1)119890 + 119887(1)119890 )

(3)

Mathematical Problems in Engineering 5

where119882(119894)119890 isin 119877119889119894minus1times119889119894 is a weight matrix of the encoder for the119894th layer with 119889119894 neurons and 119887(119894)119890 isin 119877119889119894 is the bias vector Theencoder parameters learning are achieved by jointly trainingthe encoder-decoder structure to reconstruct the ldquocorruptrdquodata by minimizing a cost function (see Section 32) There-fore the decoder can be defined as a combination of severallayers integrating a nonlinear activation function 119906119892(sdot)whichreconstructs the ldquocorruptrdquo data from the encoder output ℎThe reconstructed output of the decoder is given by

= 119892 (119909) = 119906119892 (119882(3)119889 119909 + 119887(3)119889 ) 119909 = 119892 (119909) = 119906119892 (119882(2)119889 119909 + 119887(2)119889 ) 119909 = 119892 (ℎ) = 119906119892 (119882(1)119889 ℎ + 119887(1)119889 )

(4)

So we can describe the complete ADSNT by its parameter120579ADSNT = 120579119882 120579119887 where 120579119882 = 119882(119894)119890 119882(119894)119889 and 120579119887 =119887(119894)119890 119887(119894)119889 119894 = 1 2 332 Formulation of Image Reconstruction Based on ADSNTNow we are ready to depict the reconstruction image basedon ADSNT The details are presented as follows

Given a set of 119896 classes training images that includegallery images (called clean data) and probe images (calledldquocorruptedrdquo data) and their corresponding class labels 119910119888 =[1 2 119896] the dataset will be used to train ADSNT forfeature learning Let 119909119894 denote a probe image and 119909119894 (119894 =1 2 119872) present gallery images corresponding to 119909119894 It isdesirable that119909119894 and119909119894 should be similarTherefore followingthe work [9 22] we obtain the following formulation

argmin120579ADSNT

119869 = 1119872 sum119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172

+ 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865) (5)

where 120579ADSNT = 120579119882 120579119887 (see Section 31) are the parametersof ADSNT which is fine-tuned by learning In this paper weonly explore the tied weights that is 119882(3)

119889= 119882(1)119879119890 119882(2)

119889=119882(2)119879119890 and 119882(1)

119889= 119882(3)119879119890 (see Figure 3(c)) 119909119894 is the recon-

struction image of the corrupted image119909119894 Like regularizationparameter 120582120579119882 balances the similarity of the same personto preserve 119891(119909119894) and 119891(119909119894) as similarly as possible 119891(sdot) is anonlinear activation function 120593 is a parameter that balancesweight penalty terms and reconstruction loss sdot 119865 presentsthe Frobenius norm and sum3119895 119882(119894)119890 2119865 + sum3119895 119882(119894)119889 2119865 ensuressmall weight values for all the hidden neurons Furthermorefollowing the work [9 14] we impose a sparsity constrainton the hidden layer to enhance learning meaningful features

Then we can further modify cost function and obtain thefollowing objection formulation

argmin120579ADSNT

119869reg= 119869 + 120574( 3sum

119894

KL (120588119909 || 1205880) + 5sum119894

KL (120588 || 1205880)) (6)

where

120588119909 = 1119872 sum119894

(12119891 (119909119894) + 1) 120588 = 1119872 sum

119894

12 (119891 (119909119894) + 1) KL (120588 || 1205880)

= sum119895

(1205880 log(1205880120588119895) + (1 minus 1205880) log(1 minus 12058801 minus 120588119895))

(7)

Here the KL divergence between two distributions that is 1205880and 120588119895 that present 120588119909 or 120588 is calculated The sparsity 1205880 isusually a constant (taking a small value according to thework[9 24] it is set to 005 in our experiments) whereas 120588119909 and 120588are the mapping mean activation values from clean data andcorrupted data respectively

33 Optimization of ADSNT For obtaining the optimizationparameter 120579ADSNT = 120579119882 120579119887 it is important to initializeweights and select an optimization training algorithm Thetrainingwill fail if the initializationweights are inappropriateThis is to say if we give network too large initializationweights the ADSNTwill be trapped in local minimum If theinitialized weights are too small the ADSNT will encounterthe vanishing gradient problem during backpropagationTherefore following the work [20 24] Gaussian RestrictedBoltzmann Machines (GRBMs) are adopted to initializeweight parameters by performing pretraining which hasbeen already applied widely For more details we referthe reader to the original paper [24] After obtaining theinitialized weights the limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimization algorithm is uti-lized to learn the parameters as it has better performance andfaster convergence than stochastic gradient descent (SGD)and conjugated gradient (CGD) [25] Algorithm 1 depicts theoptimization procedure of ADSNT

Algorithm 1 (learning adaptive deep supervised networktemplate)

Input Training images Ω 119896 classes and each class is com-posed of the face with neutral expression frontal poseand normal illumination condition (clean data) and randomnumber of variant faces (corrupted data) Number of networklayers 119871 Iterative number 119868 balancing parameters 120582 120593 and 120574and convergence error 120576Output Weight parameters 120579ADSNT = 120579119882 120579119887

6 Mathematical Problems in Engineering

(1) Preprocess all images namely perform histogramequalization

(2) 119883 Randomly select a small subset for each individualfromΩ

(3) Initialize Train GRBMs by using 119883 to initialize the120579ADSNT = 120579119882 120579119887(4) (Optimization by L-BFGS)

For 119903 = 1 2 119877 doCalculate 119869reg using (6)

If 119903 gt 1 and |119869119903 minus119869119903minus1| lt 120576 go to ReturnReturn 120579119882 and 120579119887

Since training the ADSNT model aims to reconstructclean data namely gallery images from corrupt data it mightlearn an underlying structure from the corrupt data andproduce very useful representation Furthermore we canlearn an overcomplete sparse representation from corruptdata through mapping them into a high-dimensional featurespace since the first hidden layer has the number of neuronslarger than the dimensionality of original data The high-dimensional model representation is then followed by a so-called ldquobottleneckrdquo that is the data is further mapped to anabstract compact and low-dimensional model representa-tion in the subsequent layers of the encoder Through sucha mapping the redundant information such as illuminationposes and partial occlusion in the corrupted faces is removedandonly the useful information content for us is kept In addi-tion we know that if we use AE with only one hidden layerand jointly linear activation functions the learned weightswould be analogous to a PCA subspace [20] However AEis an unsupervised algorithm In our work we make use ofthe class label information to train SAE so if we also useonly one hidden layer with a linear activation function thelearnedweights by the SAE are thought to be similar to ldquoLDArdquosubspace However in our structure we apply the nonlinearactivation functions and stack several hidden layers togetherand then theADSNTcan adapt to very complicated nonlinearmanifold structures Some of reconstructed images based onADSNT fromAR database are shown in Figure 4(b) One cansee that ADSNT can remove the illumination For those faceimages with partial occlusion ADSNT can also imitate theclean facesThis results are not surprising because the humanbeing has the capability of inferring the unknown faces fromknown face images via the experience (for deep networkstructure the experience learned derives from generic set)[9]34 Face Classification Based on ADSNT Image Reconstruc-tion To better train ADSNT all images need to be pre-processed It is a very important step for object recogni-tion including face recognition The common ways includehistogram equalization geometry normalization and imagesmoothing In this paper for the sake of simplicity we onlyperform histogram equalization on all the facial images tominimize illumination variations That is we utilize his-togram equalization to normalize the histogram of facial

images and make them more compact For the details abouthistogram equalization one can be referred to see [26]

After the ADSNT is trained completely with a certainnumber of individuals we can use it to performon the unseenface images for recognizing them

Given a test facial image 119909(119905) which is also preprocessedwith histogram equalization in the same way as the trainingimages and presented to the ADSNTnetwork we reconstruct(using (3) and (4)) image 119909(119905) from ADSNT which is similarto clean face For the sake of simplicity the nearest neighborclassification based on the Euclidean distance between thereconstruction and all the gallery images identifies the classThe classification formula is defined as

119868119896 (119909(119905)) = argmin119892

1003817100381710038171003817100381710038171003817119909(119905) minus 1199091198921003817100381710038171003817100381710038171003817 forall119892 isin 1 2 119888 (8)

where 119868119896(119909(119905)) is the resulting identity and119909119892 is the clean facialimage in the gallery images of individual 1198924 Experimental Results and Discussion

In this section extensive experiments are conducted topresent and compare the performance of different methodswith the proposed approach The experiments are imple-mented on three widely used face databases that is AR[27] Extended Yale B [28] and PubFig [29] The details ofthese three databases and performance evaluation of differentapproaches are presented as follows

41 Dataset Description TheARdatabase contains over 4000color face images from 126 people (56 women and 70 men)The images were taken in two sessions (between two weeks)and each session contained 13 pictures from one personThese images contain frontal view faces with different facialexpression illuminations and occlusions (sun glasses andscarf) Some sample face images from AR are illustratedin Figure 5(a) In our experiments for each person wechoose the facial imageswith neutral expression frontal poseand normal illumination condition as gallery images andrandomly select half the number of images from the rest ofthe images of each person as probe images The remainingimages compose the testing set

The Extended Yale B database consists of 16128 imagesof 38 people under 64 illumination conditions and 9 posesSome sample face images fromExtendedYale B are illustratedin Figure 5(b) For each person we select the faces that havenormal light condition and frontal pose as gallery images andrandomly choose 6 poses and 16 illumination face images tocompose the probe images The remaining images composethe testing set

The PubFig database is composed of 58797 images of200 subjects taken from the internet The images of thedatabase were taken in completely uncontrolled conditionswith noncooperative people These images have a very largedegree of variability in face expression pose illuminationand so forth Some sample images fromPubFig are illustratedin Figure 5(c) In our experiments for each individualwe select the faces with neutral expression the frontal or

Mathematical Problems in Engineering 7

(a)

(b)

Figure 4 Some original images fromARdatabase and the reconstructed ones (a) Original face images (corrupted faces) (b) Reconstructedfaces near frontal pose and normal illumination as galleries andrandomly choose half the number of images from the rest ofthe images of each person as probes The remaining imagescompose the testing set42 Experimental Settings In all the experiments the facialimages from the AR PubFig and Extended Yale B databasesare automatically detected using OpenCV face detector [30]After that we normalize the detected facial images (inorientation and scale) such that two eyes can be alignedat the same location Then the face areas are cropped andconverted to 256 gray levels images The size of each croppedimage is 26 times 30 pixels Thus the dimensionality of the inputvector is 780 Figure 6 presents an example fromAR databaseand the corresponding cropped image Each cropped facialimage is further preprocessed with histogram equalizationto minimize illumination variations We train our ADSNTmodel with 3 hidden layers where the number of hidden

nodes for these layers is empirically set as [1024 rarr 500 rarr120] because our experiments show that three hidden layerscan get a sufficiently good performance (see Section 433)

In order to show the whole experimental process aboutparameters setting we initially use the hyperbolic tangentfunction as the nonlinear activation function and implementADSNT on AR We also choose the face images with neutralexpression frontal pose and normal illumination as galleriesand randomly select half the number of images from the restof the images of each person as probe images The remainingimages compose the testing setThemean identification ratesare recorded

Firstly we empirically set the parameter 120576 = 0001 andsparsity target 1205880 = 005 and fix the parameters 120582 = 05and 120593 = 01 in ADSNT to check the effect of 120574 on theidentification rate As illustrated in Figure 6(a) where 120574 =008 ADSNT recognition method gets the best performance

8 Mathematical Problems in Engineering

(a)

(b)

(c)

Figure 5 A fraction of samples from AR PubFig and Extended Yale B face databases (a) AR (b) Extended Yale B and (c) PubFig

Then according to Figure 6(a) we fix the parameters 120574 = 008and 120582 = 05 in ADSNT to check the influence of 120593 Asshowed in Figure 6(b) when 120593 = 06 our method achievesthe best recognition rate At last we fix 120574 = 008 and 120593 = 06and the recognition rates are illustrated in Figure 6(c) withdifferent value of 120582 When 120582 = 3 the recognition rate is thehighest From the plot in Figure 6 one can observe that theparameters 120582 120593 and 120574 cannot be too large or too small If120582 is too large the ADSNT would be less discriminative ofdifferent subjects because it implements too strong similaritypreservation entry But if 120582 is too small it will degrade therecognition performance and the significance of similaritypreservation entry Similarly 120574 can also not be too large orthe hidden neurons will not be activated for a given input

and low recognition rate will be achieved If 120574 is too smallwe can get poor performance For the weight decay 120593 if itis too small the values of weights for all hidden units willchange very slightly On the contrary the values of weightswill change greatly

Using above those experiments we gain the optimalparameter values used in ADSNT as 120582 = 3 120593 = 06 and120574 = 008 on AR database The similar experiments also havebeen performed on Extended Yale B and PubFig databasesWe can get the parameters setting as 120582 = 26 120593 = 05 and120574 = 006 on Extended Yale B database and 120582 = 28 120593 = 052and 120574 = 009 on PubFig database

In the experiments we use two measures includingthe mean identification accuracy 120583 with standard deviation

Mathematical Problems in Engineering 9

Iden

tifica

tion

rate

()

120582 = 05120593 = 01

76

80

84

88

92

01

011

005

006

007

008

009

012

013

014

000

5

000

1

000

01

The parameter 120574

(a)

Iden

tifica

tion

rate

()

95

90

85

80

75

120582 = 05

1 2

08

07

010

02

03

04

05

06

09

22

004

002

008

000

9

000

6

The parameter 120593

120574 = 008

(b)

Iden

tifica

tion

rate

()

The parameter 120582

120593 = 06120574 = 008

80

85

90

55

50

45

40

35

3025

20

15

10

05

00

60

(c)

Figure 6 Parameters setting

](120583 plusmn ]) and the receiving operating characteristic (ROC)curves to validate the effectiveness of our method as well asother methods

43 Experimental Results and Analysis

431 Comparison with Different Methods In the followingexperiments on the three databases we compare the pro-posed approach with several recently proposed methodsThese compared methods include DAE with 10 randommask noises [12] marginalized DAE (MDAE) [17] Constrac-tive Autoencoders (CAE) [15] Deep Lambertian Networks(DLN) [31] stacked supervised autoencoder (SSAE) [9] ICA-Reconstruction (RICA) [32] and Template Deep Recon-struction Model (TDRM) [20] We use the implementationof these algorithms that are provided by the respectiveauthors For all the compared approaches we use the defaultparameters that are recommended in the correspondingpapers

The mean identification accuracy with standard devia-tions of different approaches on three databases is shownin Table 1 The ROC curves of different approaches areillustrated in Figure 7 The results imply that our approach

Table 1 Comparisons of the average identification accuracyand standard deviation () of different approaches on differentdatabases

Method AR Extended Yale B PubFigDAE [12] 5756 plusmn 02 6345 plusmn 13 6133 plusmn 15MDAE [17] 6780 plusmn 13 7156 plusmn 16 7055 plusmn 25CAE [15] 4950 plusmn 21 5572 plusmn 08 6856 plusmn 16DLN [31] NA 8150 plusmn 14 7760 plusmn 14SSAE [9] 8521 plusmn 07 8222 plusmn 03 8404 plusmn 12RICA [32] 7633 plusmn 17 7044 plusmn 13 7235 plusmn 15TDRM [20] 8770 plusmn 06 8642 plusmn 12 8990 plusmn 09Our method 9232 plusmn 07 9366 plusmn 04 9126 plusmn 16significantly outperforms other methods and gets the bestmean recognition rates for the same setting of trainingand testing sets Compared to those unsupervised deeplearning methods such as DAE MDAE CAE DLN andTDRM the improvement of our method is over 30 onExtended Yale B and AR databases where there is a littlepose variance On the PubFig database our approach canalso achieve the mean identification rate of 9126 plusmn 16

10 Mathematical Problems in Engineering

True

pos

itive

rate

04

05

06

07

08

09

10

02 03 04 05 06 07 08 09 1001

False positive rate

DAEMDAECAEDLN

SSAERICATDRMOur method

(a)

01 02 03 04 05 06 07 08 09 1000

False positive rate

07

08

09

10

True

pos

itive

rate

DAEMDAECAESSAE

RICATDRMOur method

(b)

050

055

060

065

070

075

080

085

090

095

100

True

pos

itive

rate

02 03 04 05 06 07 08 09 1001

False positive rate

DAEMDAECAEDLN

SSAERICATDRMOur method

(c)

Figure 7 Comparisons of ROC curves between our method and other methods on different databases (a) AR (b) Extended Yale B and (c)PubFig

and outperforms all compared methods The reason is thatourmethod can extract discriminative robust information tovariances (expression illumination pose etc) in the learneddeep networks Compared with a supervised method likeRICA the proposed method can improve over 16 19and 23 on AR PubFig and Extended Yale B databasesrespectively Our method is a deep learning method whichfocuses on the nonlinear classification problem with learninga nonlinear mapping such that more nonlinear discriminantinformation may be explored to enhance the identificationperformance Compared with SSAE method that is designedfor removing the variances such as illumination pose andpartial occlusion our method can still be better over 6

because of using the weight penalty terms GRBM to initializeweights and three layersrsquo similarity preservation term

432 Convergence Analysis In this subsection we evaluatedthe convergence of our ADSNT versus a different numberof iterations Figure 8 illustrates the value of the objectivefunction of ADSNT versus a different number of iterationson the AR PubFig and Extended Yale B databases FromFigure 8(a) one can observe that ADSNT converges in about55 28 and 70 iterations on the three databases respectively

We also implement the identification accuracy of ADSNTversus a different number of iterations on the AR PubFigand Extended Yale B databases Figure 8(b) plots the mean

Mathematical Problems in Engineering 11

Obj

ectiv

e fun

ctio

n va

lue

4

3

2

1

0

Iteration number0 30 60 90 120 150

ARExtended Yale BPubFig

(a)

ARExtended Yale BPubFig

Mea

n id

entifi

catio

n ra

te (

)

100

80

60

40

Iteration number20 40 60 80

(b)

Figure 8 Convergence analysis (a) Convergence curves of ADSNT on AR PubFig and Extended Yale B (b) Mean identification rate ()versus iterations of ADSNT on AR PubFig and Extended Yale B

Iden

tifica

tion

accu

racy

()

AR Extended Yale B PubFig0

10

20

30

40

50

60

70

80

90

100

Layer 1

Layer 2

Layer 3

Layer 4

Figure 9 The results of ADSNT with different network depth onthe different datasets

identification rate of ADSNT From Figure 8(b) one can alsoobserve that ADSNT achieves stable performance after about55 70 and 28 iterations on AR PubFig and Extended Yale Bdatabases respectively

433 The Effect of Network Depth In this subsection weconduct experiments on the three face datasets with differenthidden layer of our proposedADSNTnetworkThe proposedmethod achieves an identification rate of 923 plusmn 06 933 plusmn12 and 9122 plusmn 08 by three-hidden layer ADSNTnetwork that is 1024 rarr 500 rarr 120 respectively on ARExtended Yale B and PubFig datasets Figure 9 illustrates the

performance of different layer ADSNT One can observe thatthree-hidden layer network outperforms 2-layer networkand the result of 3-layer ADSNT network is very nearly equalto those of 4-layer network on AR and Extended Yale Bdatabases We also observe that the performance of 4-layernetwork is a bit lower than that of 3-layer network on thePubFig database In addition the deeper ADSNT networkis the more complex its computational complexity becomesTherefore the 3-layer network depth is a good trade-offbetween performance and computational complexity

434 Activation Function Following the work in [9] we alsoestimate the performance of ADSNTwith different activationfunctions such as sigmoid hyperbolic tangent and rectifiedlinear unit (ReLU) [33] which is defined as 119891(119909) = max(0 119909)When the sigmoid 119891(119909) = 1(1 + 119890minus119909) is used as activationfunction the objective function (see (6)) is rewritten asfollows

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

KL (120588119909 || 1205880) + 5sum119894

KL (120588 || 1205880))

(9)

where 120588119909 = (1119872)sum119894 119891(119909119894) 120588 = (1119872)sum119894 119891(119909119894)

12 Mathematical Problems in Engineering

Table 2 Comparisons of the ADSNT algorithm with differentactivation functions on the AR PubFig and Extended Yale Bdatabases

Dataset Sigmoid Tanh ReLUAR 8866 plusmn 14 9232 plusmn 07 9322 plusmn 15Extended Yale B 9055 plusmn 06 9366 plusmn 04 9454 plusmn 03PubFig 8740 plusmn 12 9126 plusmn 16 9244 plusmn 11

If ReLU is adopted as activation function (6) is formu-lated as

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171 + 5sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171)

(10)

Table 2 shows the performance of the proposed ADSNTbased on different activation functions conducted on thethree databases FromTable 2 one can see that ReLU achievesthe best performanceThe key reason is that we use theweightdecay term 120593 to optimize the objective function

435 Timing Consumption Analysis In this subsection weuse a HP Z620 workstation with Intel Xeon E5-2609 24GHzCPU 8G RAM and conduct a series of experiments onAR database to compare the time consumption of differentmethods which are tabulated in Table 3 The training time(seconds) is shown in Table 3(a) while the time (seconds)needed to recognize a face from the testing set is shownin Table 3(b) From Table 3 one can see that the proposedmethod requires comparatively more time for trainingbecause of initialization of ADSNT and performing imagereconstruction However the procedure of training is offlineWhen we identity an image from testing set our methodrequires less time than other methods

5 Conclusions

In this article we present an adaptive deep supervisedautoencoder based image reconstruction method for facerecognition Unlike conventional deep autoencoder basedface recognition method our method considers the classlabel information from training samples in the deep learn-ing procedure and can automatically discover the under-lying nonlinear manifold structures Specifically a multi-layer supervised adaptive network structure is presentedwhich is trained to extract characteristic features from cor-ruptedclean facial images and reconstruct the correspondingsimilar facial images The reconstruction is realized by a so-called ldquobottleneckrdquo neural network that learns to map face

Table 3 (a) Training time (seconds) for different methods (b)Testing time (seconds) for different methodsThe proposed methodcosts the least amount of testing time comparing with othermethods

(a)

Methods TimeDAE [12] 861MDAE [17] 1054CAE [15] 786DLN [31] 2343SSAE [9] 5351RICA [32] 1344TDRM [20] 1102Our method 12232

(b)

Methods TimeDAE [12] 027MDAE [17] 03CAE [15] 026DLN [31] 035SSAE [9] 022RICA [32] 019TDRM [20] 018Our method 013

images into a low-dimensional vector and to reconstructthe respective corresponding face images from the mappingvectors Having trained the ADSNT a new face image canthen be recognized by comparing its reconstruction imagewith individual gallery images during testing The proposedmethod has been evaluated on the widely used AR PubFigand Extended Yale B databases and the experimental resultshave shown its effectiveness For future work we are focusingon applying our proposed method to other application fieldssuch as pattern classification based on image set and actionrecognition based on the video to further demonstrate itsvalidity

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is partially supported by the research grant for theNatural Science Foundation from Sichuan Provincial Depart-ment of Education (Grant no 13ZB0336) and the NationalNatural Science Foundation of China (Grant no 61502059)

References

[1] J-T Chien and C-CWu ldquoDiscriminant waveletfaces and near-est feature classifiers for face recognitionrdquo IEEE Transactions on

Mathematical Problems in Engineering 13

Pattern Analysis and Machine Intelligence vol 24 no 12 pp1644ndash1649 2002

[2] Z Lei M Pietikainen and S Z Li ldquoLearning discriminant facedescriptorrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 36 no 2 pp 289ndash302 2014

[3] B Moghaddam T Jebara and A Pentland ldquoBayesian facerecognitionrdquo Pattern Recognition vol 33 no 11 pp 1771ndash17822000

[4] N Cristianini and J S Taylor An Introduction to SupportVector Machines and Other Kernel-based Learning MethodsCambridge University Press New York NY USA 2004

[5] R O Duda P E Hart and D G Stork Pattern ClassificationJohn Wiley amp Sons New York NY USA 2nd edition 2001

[6] W Zhao R Chellappa P J Phillips and A Rosenfeld ldquoFacerecognition a literature surveyrdquo ACM Computing Surveys vol35 no 4 pp 399ndash458 2003

[7] B Zhang S Shan X Chen and W Gao ldquoHistogram of Gaborphase patterns (HGPP) a novel object representation approachfor face recognitionrdquo IEEE Transactions on Image Processingvol 16 no 1 pp 57ndash68 2007

[8] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inNeural Information Processing Systems pp 1527ndash1554 2012

[9] S Gao Y Zhang K Jia J Lu and Y Zhang ldquoSingle sample facerecognition via learning deep supervised autoencodersrdquo IEEETransactions on Information Forensics and Security vol 10 no10 pp 2108ndash2118 2015

[10] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo American Associationfor the Advancement of Science Science vol 313 no 5786 pp504ndash507 2006

[11] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesmrdquo in Neural Networks Tricks ofthe Trade pp 437ndash478 Springer Berlin Germany 2012

[12] P Vincent H Larochelle I Lajoie Y Bengio and P-AManzagol ldquoStacked denoising autoencoders learning usefulrepresentations in a deep network with a local denoisingcriterionrdquo Journal of Machine Learning Research (JMLR) vol 11no 5 pp 3371ndash3408 2010

[13] V Nair and G E Hinton ldquoRectified linear units improveRestricted Boltzmann machinesrdquo in Proceedings of the 27thInternational Conference on Machine Learning (ICML rsquo10) pp807ndash814 Haifa Israel June 2010

[14] A Coates H Lee and A Y Ng ldquoAn analysis of single-layernetworks in unsupervised feature learningrdquo in Proceedings ofthe 14th International Conference on Artificial Intelligence andStatistics (AISTATS rsquo11) pp 215ndash223 Sardinia Italy 2010

[15] S Rifai P Vincent X Muller X Glorot and Y BengioldquoContractive auto-encoders explicit invariance during featureextractionrdquo in Proceedings of the 28th International Conferenceon Machine Learning (ICML rsquo11) pp 833ndash840 Bellevue WashUSA July 2011

[16] K Simonyan A Vedaldi and A Zisserman ldquoDeep fishernetworks for large-scale image classificationrdquo in Proceedings ofthe 27th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo13) pp 163ndash171 Lake Tahoe Nev USA Decem-ber 2013

[17] M Chen Z Xu K Q Weinberger and F Sha ldquoMarginalizeddenoising autoencoders for domain adaptationrdquo in Proceedingsof the 29th International Conference onMachine Learning (ICMLrsquo12) pp 767ndash774 Edinburgh UK July 2012

[18] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquo inProceedings of the 27th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 June 2014

[19] Z Zhu P Luo X Wang and X Tang ldquoDeep learning identity-preserving face spacerdquo in Proceedings of the 14th IEEE Interna-tional Conference on Computer Vision (ICCV rsquo13) pp 113ndash120Sydney Australia December 2013

[20] M Hayat M Bennamoun and S An ldquoDeep reconstructionmodels for image set classificationrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 37 no 4 pp 713ndash727 2015

[21] Y Sun X Wang and X Tang ldquoDeep learning face representa-tion from predicting 10000 classesrdquo in Proceedings of the 27thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo14) pp 1891ndash1898 Columbus Ohio USA June 2014

[22] Y Sun X Wang and X Tang ldquoDeep learning face rep-resentation by joint identification-verificationrdquo Tech Rephttpsarxivorgabs14064773

[23] X Cai CWang B Xiao X Chen and J Zhou ldquoDeep nonlinearmetric learning with independent subspace analysis for faceverificationrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 749ndash752 November2012

[24] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[25] Q V Le J Ngiam A Coates A Lahiri B Prochnow andA Y Ng ldquoOn optimization methods for deep learningrdquo inProceedings of the 28th International Conference on MachineLearning (ICML rsquo11) pp 265ndash272 Bellevue Wash USA July2011

[26] C Zhou X Wei Q Zhang and X Fang ldquoFisherrsquos lineardiscriminant (FLD) and support vectormachine (SVM) in non-negative matrix factorization (NMF) residual space for facerecognitionrdquoOptica Applicata vol 40 no 3 pp 693ndash704 2010

[27] A Martinez and R Benavente ldquoThe AR face databaserdquo CVCTech Rep 24 1998

[28] A S Georghiades P N Belhumeur and D J Kriegman ldquoFromfew to many illumination cone models for face recognitionunder variable lighting and poserdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 23 no 6 pp 643ndash6602001

[29] N Kumar A C Berg P N Belhumeur and S K NayarldquoAttribute and simile classifiers for face verificationrdquo in Proceed-ings of the 12th International Conference on Computer Vision(ICCV rsquo09) pp 365ndash372 Kyoto Japan October 2009

[30] M Rezaei and R Klette ldquoNovel adaptive eye detection andtracking for challenging lighting conditionsrdquo in ComputerVisionmdashACCV 2012 Workshops J-I Park and J Kim Edsvol 7729 of Lecture Notes in Computer Science pp 427ndash440Springer Berlin Germany 2013

[31] Y Tang R Salakhutdinov andGHHinton ldquoDeep Lambertiannetworksrdquo inProceedings of the 29th International Conference onMachine Learning (ICML 2012) pp 1623ndash1630 Edinburgh UKJuly 2012

[32] Q V Le A Karpenko J Ngiam and A Y Ng ldquoICA withreconstruction cost for efficient overcomplete feature learningrdquoin Proceedings of the 25th Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo11) pp 1017ndash1025 GranadaSpain December 2011

14 Mathematical Problems in Engineering

[33] X Glorot A Bordes and Y Bengio ldquoDeep sparse rectifier neu-ral networksrdquo in Proceedings of the 14th International Conferenceon Artificial Intelligence and Statistics pp 315ndash323 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article Adaptive Deep Supervised Autoencoder ...Adaptive Deep Supervised Network Template (ADSNT). the deep network perform well, similar to [], we need to giveitinitializationweights.en,thepreinitializedADSNT

Mathematical Problems in Engineering 3

Facedataset

Test image

Preprocessfor examplehistogram

equalization

Preprocessfor examplehistogram

equalization

Train

Map

ADSNT

Galleryimages

Probeimages

Trainingset

120579ADSNT

Output label I of the test imageI = argmin forallg isin 1 2 c

g

100381710038171003817100381710038171003817x(t) minus xg

100381710038171003817100381710038171003817

Figure 2 Flowchart of the proposed ADSNT image reconstruction for face recognition

Here sdot 22 is the reconstruction error and sdot 119865 denotes theFrobenius norm and 120582 is a parameter that balances the recon-struction loss and weight penalty terms With reconstructingthe clean input data from a corrupted version of it a DAEcan exploremore robust features than a conventional AE onlysimply learning the identity mapping

To further promote learningmeaningful features sparsityconstraints [14] are utilized to impose on the hidden neuronswhen the number of hidden neurons is large which is definedin the light of the Kullback-Leibler (KL) divergence as119898sum119895

KL (120588 || 120588119895) = 119898sum119895=1

120588 log 120588120588119895 + (1 minus 120588) log( 1 minus 1205881 minus 120588119895) (2)

where 119898 is the number of neurons in one hidden layer 120588119895is determined by taking the average activation of a hiddenunit 119895 (over all the training set) and 120588 is a sparsity parameter(typically a small value)

After finishing119891 and 119892 learning the output from encoderℎ is input to the next layer Through training such DAElayerwise stacked denoising autoencoders (SDAE) are thenbuilt Its structure is illustrated in Figure 1(b)

In the real-word application like face recognition thefaces are usually influenced by all kinds of variances such asexpression illumination pose and occlusion To overcomethe effect of variances Gao et al [9] proposed supervisedautoencoder based on the principle of DAE They treatedthe training sample (gallery image) from each person withfrontaluniform illumination neural expression and withoutocclusion as clean data and test faces (probe images) accom-panied by variances (expression illumination occlusion etc)as corrupted data A mapping capturing the discriminantstructure of the facial images from different people is learnedwhile keeping robust to the variances in these faces Thenrobust feature is extracted for image presentation and theperformance of face recognition is greatly enhanced

22 Deep Learning Based Face Recognition System In theearly face recognition there have been various face represen-tation methods including hand-crafted or ldquoshallowrdquo learningways [6 7] In recent years with the development of bigdata and computer hardware feature learning based on deep

structure has been greatly successful in image representationfield [8 12 15 16] By means of deep structure learning theability of model representation gets great enhancement andwe can learn complicated (nonlinear) information from orig-inal data effectively In [16] deep Fisher networkwas designedthrough stacking all the Fisher vectors which greatly per-formed over conventional Fisher vector representation Chenet al [17] proposed marginalized SDAE to learn the opti-mal closed-form solution which reduced the computationalcomplexity and improved the scalability of high-dimensionaldescriptive features Taigman et al [18] presented a faceverification system based on Convolutional Neural Networks(CNNs) which also obtained high accuracy of verification onthe LFW dataset Zhu et al [19] designed a network structurethat is composed of facial identity-preserving layer andimage reconstruction layer which can reduce intravarianceand achieve discriminant information preservation In [20]Hayat et al proposed a deep learning framework basedon AE with application to image set classification and facerecognition which obtained the best performance comparingwith existing state-of-the-art methods Gao et al [9] furtherproposed an SAE which can be used to build the deeparchitecture and can extract the facial features that are robustto variants Sun et al [21] learned multiple convolutionalnetworks (ConvNets) from predicting 10000 subjects whichgeneralized well to face verification issue Furthermore theyimproved the ConvNets by incorporating identification andverificationmissions and enhanced recognition performance[22] Cai et al [23] stacked several sparse independentsubspace analyses (sISA) to construct deep network structureto learn identity representation

3 Proposed Method

This section presents our proposed approach whose blockdiagram is illustrated in Figure 2 Firstly inspired by stackedDAE and SAE [9] we define Adaptive Deep SupervisedNetwork Template (ADSNT) that can learn an underlyingnonlinear manifold structure from the facial images Thebasic architecture of ADSNT is illustrated in Figure 3(c) andthe corresponding details are depicted in Section 31 Tomake

4 Mathematical Problems in Engineering

Similar

f1205791f1205791

g1205799984001

x x x

Similar

middot middot middot

middot middot middot middot middot middot middot middot middot

middot middot middot

(a)

Similar

f3f3

f2f2

f1f1g1

g2

g3

middot middot middot

middot middot middot middot middot middot middot middot middot

middot middot middotmiddot middot middot

middot middot middot middot middot middot middot middot middotx x x

x

Similar

(b)

middot middot middot middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

JH(x x)

x x

g1205799984001

g1205799984002

g1205799984003

f1205793

f1205792

f1205791

DC

EC

f1205793

f1205792

f1205791

x

Corrupted facex

Clean face

(c)

Figure 3 Architecture of SSAE and ADSNT (a) Supervised autoencoder (SAE) which is comprised of cleanldquocorruptedrdquo datum one hiddenlayer and one reconstruction layer by using the ldquocorrupted datumrdquo (b) stacked supervised autoencoder (SSAE) (c) architecture of theAdaptive Deep Supervised Network Template (ADSNT)

the deep network perform well similar to [20] we need togive it initialization weights Then the preinitialized ADSNTis trained to reconstruct the invariant faces which are insen-sitive to illumination pose and occlusion Finally havingtrained the ADSNT we use the nearest neighbor classifier torecognize a new face image by comparing its reconstructionimage with individual gallery images respectively

31 Adaptive Deep Supervised Network Template (ADSNT)As presented in Figure 3(c) our ADSNT is a deep supervisedautoencoder (DSAE) that consists of two parts an encoder(EC) and a decoder (DC) Each of them has three hiddenlayers and they share the third layer that is the centralhidden layer The features learned from the hidden layer andthe reconstructed clean face are obtained by using the ldquocor-ruptedrdquo data to train the SSAE In the process of pretrainingwe learn a stack of SAE each having only one hidden layerof feature detectors Then the learned activation features ofone SAE are used as ldquodatardquo for training the next SAE in thestack Such training is repeated a number of times until we getthe desired number of layers Although we use the basic SAEstructure which is shown in Figure 3(a) [9] to construct thestacked supervised autoencoder (SSAE) Gao et alrsquos stackedsupervised autoencoder only used two hidden layers and

one reconstruction layer In this paper we use three hiddenlayers to compose the encoder and decoder respectivelywhose structures are shown in Figures 3(b) and 3(c) Theencoder part tries best to seek a compact low-dimensionalmeaningful representation of the cleanldquocorruptedrdquo dataFollowing the work [20] the encoder can be formulated asa combination of several layers which are connected with anonlinear activation function 119906119891(sdot) We can use a sigmoidfunction or a rectified linear unit as nonlinear activation tomap the cleanldquocorruptedrdquo data 119909 to a representation ℎ asfollows

ℎ = 119891 (ℎ2) = 119906119891 (119882(3)119890 ℎ2 + 119887(3)119890 ) ℎ2 = 119891 (ℎ1) = 119906119891 (119882(2)119890 ℎ1 + 119887(2)119890 ) ℎ1 = 119891 (119909) = 119906119891 (119882(1)119890 119909 + 119887(1)119890 ) ℎ = 119891 (ℎ2) = 119906119891 (119882(3)119890 ℎ2 + 119887(3)119890 ) ℎ2 = 119891 (ℎ1) = 119906119891 (119882(2)119890 ℎ1 + 119887(2)119890 ) ℎ1 = 119891 () = 119906119891 (119882(1)119890 + 119887(1)119890 )

(3)

Mathematical Problems in Engineering 5

where119882(119894)119890 isin 119877119889119894minus1times119889119894 is a weight matrix of the encoder for the119894th layer with 119889119894 neurons and 119887(119894)119890 isin 119877119889119894 is the bias vector Theencoder parameters learning are achieved by jointly trainingthe encoder-decoder structure to reconstruct the ldquocorruptrdquodata by minimizing a cost function (see Section 32) There-fore the decoder can be defined as a combination of severallayers integrating a nonlinear activation function 119906119892(sdot)whichreconstructs the ldquocorruptrdquo data from the encoder output ℎThe reconstructed output of the decoder is given by

= 119892 (119909) = 119906119892 (119882(3)119889 119909 + 119887(3)119889 ) 119909 = 119892 (119909) = 119906119892 (119882(2)119889 119909 + 119887(2)119889 ) 119909 = 119892 (ℎ) = 119906119892 (119882(1)119889 ℎ + 119887(1)119889 )

(4)

So we can describe the complete ADSNT by its parameter120579ADSNT = 120579119882 120579119887 where 120579119882 = 119882(119894)119890 119882(119894)119889 and 120579119887 =119887(119894)119890 119887(119894)119889 119894 = 1 2 332 Formulation of Image Reconstruction Based on ADSNTNow we are ready to depict the reconstruction image basedon ADSNT The details are presented as follows

Given a set of 119896 classes training images that includegallery images (called clean data) and probe images (calledldquocorruptedrdquo data) and their corresponding class labels 119910119888 =[1 2 119896] the dataset will be used to train ADSNT forfeature learning Let 119909119894 denote a probe image and 119909119894 (119894 =1 2 119872) present gallery images corresponding to 119909119894 It isdesirable that119909119894 and119909119894 should be similarTherefore followingthe work [9 22] we obtain the following formulation

argmin120579ADSNT

119869 = 1119872 sum119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172

+ 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865) (5)

where 120579ADSNT = 120579119882 120579119887 (see Section 31) are the parametersof ADSNT which is fine-tuned by learning In this paper weonly explore the tied weights that is 119882(3)

119889= 119882(1)119879119890 119882(2)

119889=119882(2)119879119890 and 119882(1)

119889= 119882(3)119879119890 (see Figure 3(c)) 119909119894 is the recon-

struction image of the corrupted image119909119894 Like regularizationparameter 120582120579119882 balances the similarity of the same personto preserve 119891(119909119894) and 119891(119909119894) as similarly as possible 119891(sdot) is anonlinear activation function 120593 is a parameter that balancesweight penalty terms and reconstruction loss sdot 119865 presentsthe Frobenius norm and sum3119895 119882(119894)119890 2119865 + sum3119895 119882(119894)119889 2119865 ensuressmall weight values for all the hidden neurons Furthermorefollowing the work [9 14] we impose a sparsity constrainton the hidden layer to enhance learning meaningful features

Then we can further modify cost function and obtain thefollowing objection formulation

argmin120579ADSNT

119869reg= 119869 + 120574( 3sum

119894

KL (120588119909 || 1205880) + 5sum119894

KL (120588 || 1205880)) (6)

where

120588119909 = 1119872 sum119894

(12119891 (119909119894) + 1) 120588 = 1119872 sum

119894

12 (119891 (119909119894) + 1) KL (120588 || 1205880)

= sum119895

(1205880 log(1205880120588119895) + (1 minus 1205880) log(1 minus 12058801 minus 120588119895))

(7)

Here the KL divergence between two distributions that is 1205880and 120588119895 that present 120588119909 or 120588 is calculated The sparsity 1205880 isusually a constant (taking a small value according to thework[9 24] it is set to 005 in our experiments) whereas 120588119909 and 120588are the mapping mean activation values from clean data andcorrupted data respectively

33 Optimization of ADSNT For obtaining the optimizationparameter 120579ADSNT = 120579119882 120579119887 it is important to initializeweights and select an optimization training algorithm Thetrainingwill fail if the initializationweights are inappropriateThis is to say if we give network too large initializationweights the ADSNTwill be trapped in local minimum If theinitialized weights are too small the ADSNT will encounterthe vanishing gradient problem during backpropagationTherefore following the work [20 24] Gaussian RestrictedBoltzmann Machines (GRBMs) are adopted to initializeweight parameters by performing pretraining which hasbeen already applied widely For more details we referthe reader to the original paper [24] After obtaining theinitialized weights the limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimization algorithm is uti-lized to learn the parameters as it has better performance andfaster convergence than stochastic gradient descent (SGD)and conjugated gradient (CGD) [25] Algorithm 1 depicts theoptimization procedure of ADSNT

Algorithm 1 (learning adaptive deep supervised networktemplate)

Input Training images Ω 119896 classes and each class is com-posed of the face with neutral expression frontal poseand normal illumination condition (clean data) and randomnumber of variant faces (corrupted data) Number of networklayers 119871 Iterative number 119868 balancing parameters 120582 120593 and 120574and convergence error 120576Output Weight parameters 120579ADSNT = 120579119882 120579119887

6 Mathematical Problems in Engineering

(1) Preprocess all images namely perform histogramequalization

(2) 119883 Randomly select a small subset for each individualfromΩ

(3) Initialize Train GRBMs by using 119883 to initialize the120579ADSNT = 120579119882 120579119887(4) (Optimization by L-BFGS)

For 119903 = 1 2 119877 doCalculate 119869reg using (6)

If 119903 gt 1 and |119869119903 minus119869119903minus1| lt 120576 go to ReturnReturn 120579119882 and 120579119887

Since training the ADSNT model aims to reconstructclean data namely gallery images from corrupt data it mightlearn an underlying structure from the corrupt data andproduce very useful representation Furthermore we canlearn an overcomplete sparse representation from corruptdata through mapping them into a high-dimensional featurespace since the first hidden layer has the number of neuronslarger than the dimensionality of original data The high-dimensional model representation is then followed by a so-called ldquobottleneckrdquo that is the data is further mapped to anabstract compact and low-dimensional model representa-tion in the subsequent layers of the encoder Through sucha mapping the redundant information such as illuminationposes and partial occlusion in the corrupted faces is removedandonly the useful information content for us is kept In addi-tion we know that if we use AE with only one hidden layerand jointly linear activation functions the learned weightswould be analogous to a PCA subspace [20] However AEis an unsupervised algorithm In our work we make use ofthe class label information to train SAE so if we also useonly one hidden layer with a linear activation function thelearnedweights by the SAE are thought to be similar to ldquoLDArdquosubspace However in our structure we apply the nonlinearactivation functions and stack several hidden layers togetherand then theADSNTcan adapt to very complicated nonlinearmanifold structures Some of reconstructed images based onADSNT fromAR database are shown in Figure 4(b) One cansee that ADSNT can remove the illumination For those faceimages with partial occlusion ADSNT can also imitate theclean facesThis results are not surprising because the humanbeing has the capability of inferring the unknown faces fromknown face images via the experience (for deep networkstructure the experience learned derives from generic set)[9]34 Face Classification Based on ADSNT Image Reconstruc-tion To better train ADSNT all images need to be pre-processed It is a very important step for object recogni-tion including face recognition The common ways includehistogram equalization geometry normalization and imagesmoothing In this paper for the sake of simplicity we onlyperform histogram equalization on all the facial images tominimize illumination variations That is we utilize his-togram equalization to normalize the histogram of facial

images and make them more compact For the details abouthistogram equalization one can be referred to see [26]

After the ADSNT is trained completely with a certainnumber of individuals we can use it to performon the unseenface images for recognizing them

Given a test facial image 119909(119905) which is also preprocessedwith histogram equalization in the same way as the trainingimages and presented to the ADSNTnetwork we reconstruct(using (3) and (4)) image 119909(119905) from ADSNT which is similarto clean face For the sake of simplicity the nearest neighborclassification based on the Euclidean distance between thereconstruction and all the gallery images identifies the classThe classification formula is defined as

119868119896 (119909(119905)) = argmin119892

1003817100381710038171003817100381710038171003817119909(119905) minus 1199091198921003817100381710038171003817100381710038171003817 forall119892 isin 1 2 119888 (8)

where 119868119896(119909(119905)) is the resulting identity and119909119892 is the clean facialimage in the gallery images of individual 1198924 Experimental Results and Discussion

In this section extensive experiments are conducted topresent and compare the performance of different methodswith the proposed approach The experiments are imple-mented on three widely used face databases that is AR[27] Extended Yale B [28] and PubFig [29] The details ofthese three databases and performance evaluation of differentapproaches are presented as follows

41 Dataset Description TheARdatabase contains over 4000color face images from 126 people (56 women and 70 men)The images were taken in two sessions (between two weeks)and each session contained 13 pictures from one personThese images contain frontal view faces with different facialexpression illuminations and occlusions (sun glasses andscarf) Some sample face images from AR are illustratedin Figure 5(a) In our experiments for each person wechoose the facial imageswith neutral expression frontal poseand normal illumination condition as gallery images andrandomly select half the number of images from the rest ofthe images of each person as probe images The remainingimages compose the testing set

The Extended Yale B database consists of 16128 imagesof 38 people under 64 illumination conditions and 9 posesSome sample face images fromExtendedYale B are illustratedin Figure 5(b) For each person we select the faces that havenormal light condition and frontal pose as gallery images andrandomly choose 6 poses and 16 illumination face images tocompose the probe images The remaining images composethe testing set

The PubFig database is composed of 58797 images of200 subjects taken from the internet The images of thedatabase were taken in completely uncontrolled conditionswith noncooperative people These images have a very largedegree of variability in face expression pose illuminationand so forth Some sample images fromPubFig are illustratedin Figure 5(c) In our experiments for each individualwe select the faces with neutral expression the frontal or

Mathematical Problems in Engineering 7

(a)

(b)

Figure 4 Some original images fromARdatabase and the reconstructed ones (a) Original face images (corrupted faces) (b) Reconstructedfaces near frontal pose and normal illumination as galleries andrandomly choose half the number of images from the rest ofthe images of each person as probes The remaining imagescompose the testing set42 Experimental Settings In all the experiments the facialimages from the AR PubFig and Extended Yale B databasesare automatically detected using OpenCV face detector [30]After that we normalize the detected facial images (inorientation and scale) such that two eyes can be alignedat the same location Then the face areas are cropped andconverted to 256 gray levels images The size of each croppedimage is 26 times 30 pixels Thus the dimensionality of the inputvector is 780 Figure 6 presents an example fromAR databaseand the corresponding cropped image Each cropped facialimage is further preprocessed with histogram equalizationto minimize illumination variations We train our ADSNTmodel with 3 hidden layers where the number of hidden

nodes for these layers is empirically set as [1024 rarr 500 rarr120] because our experiments show that three hidden layerscan get a sufficiently good performance (see Section 433)

In order to show the whole experimental process aboutparameters setting we initially use the hyperbolic tangentfunction as the nonlinear activation function and implementADSNT on AR We also choose the face images with neutralexpression frontal pose and normal illumination as galleriesand randomly select half the number of images from the restof the images of each person as probe images The remainingimages compose the testing setThemean identification ratesare recorded

Firstly we empirically set the parameter 120576 = 0001 andsparsity target 1205880 = 005 and fix the parameters 120582 = 05and 120593 = 01 in ADSNT to check the effect of 120574 on theidentification rate As illustrated in Figure 6(a) where 120574 =008 ADSNT recognition method gets the best performance

8 Mathematical Problems in Engineering

(a)

(b)

(c)

Figure 5 A fraction of samples from AR PubFig and Extended Yale B face databases (a) AR (b) Extended Yale B and (c) PubFig

Then according to Figure 6(a) we fix the parameters 120574 = 008and 120582 = 05 in ADSNT to check the influence of 120593 Asshowed in Figure 6(b) when 120593 = 06 our method achievesthe best recognition rate At last we fix 120574 = 008 and 120593 = 06and the recognition rates are illustrated in Figure 6(c) withdifferent value of 120582 When 120582 = 3 the recognition rate is thehighest From the plot in Figure 6 one can observe that theparameters 120582 120593 and 120574 cannot be too large or too small If120582 is too large the ADSNT would be less discriminative ofdifferent subjects because it implements too strong similaritypreservation entry But if 120582 is too small it will degrade therecognition performance and the significance of similaritypreservation entry Similarly 120574 can also not be too large orthe hidden neurons will not be activated for a given input

and low recognition rate will be achieved If 120574 is too smallwe can get poor performance For the weight decay 120593 if itis too small the values of weights for all hidden units willchange very slightly On the contrary the values of weightswill change greatly

Using above those experiments we gain the optimalparameter values used in ADSNT as 120582 = 3 120593 = 06 and120574 = 008 on AR database The similar experiments also havebeen performed on Extended Yale B and PubFig databasesWe can get the parameters setting as 120582 = 26 120593 = 05 and120574 = 006 on Extended Yale B database and 120582 = 28 120593 = 052and 120574 = 009 on PubFig database

In the experiments we use two measures includingthe mean identification accuracy 120583 with standard deviation

Mathematical Problems in Engineering 9

Iden

tifica

tion

rate

()

120582 = 05120593 = 01

76

80

84

88

92

01

011

005

006

007

008

009

012

013

014

000

5

000

1

000

01

The parameter 120574

(a)

Iden

tifica

tion

rate

()

95

90

85

80

75

120582 = 05

1 2

08

07

010

02

03

04

05

06

09

22

004

002

008

000

9

000

6

The parameter 120593

120574 = 008

(b)

Iden

tifica

tion

rate

()

The parameter 120582

120593 = 06120574 = 008

80

85

90

55

50

45

40

35

3025

20

15

10

05

00

60

(c)

Figure 6 Parameters setting

](120583 plusmn ]) and the receiving operating characteristic (ROC)curves to validate the effectiveness of our method as well asother methods

43 Experimental Results and Analysis

431 Comparison with Different Methods In the followingexperiments on the three databases we compare the pro-posed approach with several recently proposed methodsThese compared methods include DAE with 10 randommask noises [12] marginalized DAE (MDAE) [17] Constrac-tive Autoencoders (CAE) [15] Deep Lambertian Networks(DLN) [31] stacked supervised autoencoder (SSAE) [9] ICA-Reconstruction (RICA) [32] and Template Deep Recon-struction Model (TDRM) [20] We use the implementationof these algorithms that are provided by the respectiveauthors For all the compared approaches we use the defaultparameters that are recommended in the correspondingpapers

The mean identification accuracy with standard devia-tions of different approaches on three databases is shownin Table 1 The ROC curves of different approaches areillustrated in Figure 7 The results imply that our approach

Table 1 Comparisons of the average identification accuracyand standard deviation () of different approaches on differentdatabases

Method AR Extended Yale B PubFigDAE [12] 5756 plusmn 02 6345 plusmn 13 6133 plusmn 15MDAE [17] 6780 plusmn 13 7156 plusmn 16 7055 plusmn 25CAE [15] 4950 plusmn 21 5572 plusmn 08 6856 plusmn 16DLN [31] NA 8150 plusmn 14 7760 plusmn 14SSAE [9] 8521 plusmn 07 8222 plusmn 03 8404 plusmn 12RICA [32] 7633 plusmn 17 7044 plusmn 13 7235 plusmn 15TDRM [20] 8770 plusmn 06 8642 plusmn 12 8990 plusmn 09Our method 9232 plusmn 07 9366 plusmn 04 9126 plusmn 16significantly outperforms other methods and gets the bestmean recognition rates for the same setting of trainingand testing sets Compared to those unsupervised deeplearning methods such as DAE MDAE CAE DLN andTDRM the improvement of our method is over 30 onExtended Yale B and AR databases where there is a littlepose variance On the PubFig database our approach canalso achieve the mean identification rate of 9126 plusmn 16

10 Mathematical Problems in Engineering

True

pos

itive

rate

04

05

06

07

08

09

10

02 03 04 05 06 07 08 09 1001

False positive rate

DAEMDAECAEDLN

SSAERICATDRMOur method

(a)

01 02 03 04 05 06 07 08 09 1000

False positive rate

07

08

09

10

True

pos

itive

rate

DAEMDAECAESSAE

RICATDRMOur method

(b)

050

055

060

065

070

075

080

085

090

095

100

True

pos

itive

rate

02 03 04 05 06 07 08 09 1001

False positive rate

DAEMDAECAEDLN

SSAERICATDRMOur method

(c)

Figure 7 Comparisons of ROC curves between our method and other methods on different databases (a) AR (b) Extended Yale B and (c)PubFig

and outperforms all compared methods The reason is thatourmethod can extract discriminative robust information tovariances (expression illumination pose etc) in the learneddeep networks Compared with a supervised method likeRICA the proposed method can improve over 16 19and 23 on AR PubFig and Extended Yale B databasesrespectively Our method is a deep learning method whichfocuses on the nonlinear classification problem with learninga nonlinear mapping such that more nonlinear discriminantinformation may be explored to enhance the identificationperformance Compared with SSAE method that is designedfor removing the variances such as illumination pose andpartial occlusion our method can still be better over 6

because of using the weight penalty terms GRBM to initializeweights and three layersrsquo similarity preservation term

432 Convergence Analysis In this subsection we evaluatedthe convergence of our ADSNT versus a different numberof iterations Figure 8 illustrates the value of the objectivefunction of ADSNT versus a different number of iterationson the AR PubFig and Extended Yale B databases FromFigure 8(a) one can observe that ADSNT converges in about55 28 and 70 iterations on the three databases respectively

We also implement the identification accuracy of ADSNTversus a different number of iterations on the AR PubFigand Extended Yale B databases Figure 8(b) plots the mean

Mathematical Problems in Engineering 11

Obj

ectiv

e fun

ctio

n va

lue

4

3

2

1

0

Iteration number0 30 60 90 120 150

ARExtended Yale BPubFig

(a)

ARExtended Yale BPubFig

Mea

n id

entifi

catio

n ra

te (

)

100

80

60

40

Iteration number20 40 60 80

(b)

Figure 8 Convergence analysis (a) Convergence curves of ADSNT on AR PubFig and Extended Yale B (b) Mean identification rate ()versus iterations of ADSNT on AR PubFig and Extended Yale B

Iden

tifica

tion

accu

racy

()

AR Extended Yale B PubFig0

10

20

30

40

50

60

70

80

90

100

Layer 1

Layer 2

Layer 3

Layer 4

Figure 9 The results of ADSNT with different network depth onthe different datasets

identification rate of ADSNT From Figure 8(b) one can alsoobserve that ADSNT achieves stable performance after about55 70 and 28 iterations on AR PubFig and Extended Yale Bdatabases respectively

433 The Effect of Network Depth In this subsection weconduct experiments on the three face datasets with differenthidden layer of our proposedADSNTnetworkThe proposedmethod achieves an identification rate of 923 plusmn 06 933 plusmn12 and 9122 plusmn 08 by three-hidden layer ADSNTnetwork that is 1024 rarr 500 rarr 120 respectively on ARExtended Yale B and PubFig datasets Figure 9 illustrates the

performance of different layer ADSNT One can observe thatthree-hidden layer network outperforms 2-layer networkand the result of 3-layer ADSNT network is very nearly equalto those of 4-layer network on AR and Extended Yale Bdatabases We also observe that the performance of 4-layernetwork is a bit lower than that of 3-layer network on thePubFig database In addition the deeper ADSNT networkis the more complex its computational complexity becomesTherefore the 3-layer network depth is a good trade-offbetween performance and computational complexity

434 Activation Function Following the work in [9] we alsoestimate the performance of ADSNTwith different activationfunctions such as sigmoid hyperbolic tangent and rectifiedlinear unit (ReLU) [33] which is defined as 119891(119909) = max(0 119909)When the sigmoid 119891(119909) = 1(1 + 119890minus119909) is used as activationfunction the objective function (see (6)) is rewritten asfollows

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

KL (120588119909 || 1205880) + 5sum119894

KL (120588 || 1205880))

(9)

where 120588119909 = (1119872)sum119894 119891(119909119894) 120588 = (1119872)sum119894 119891(119909119894)

12 Mathematical Problems in Engineering

Table 2 Comparisons of the ADSNT algorithm with differentactivation functions on the AR PubFig and Extended Yale Bdatabases

Dataset Sigmoid Tanh ReLUAR 8866 plusmn 14 9232 plusmn 07 9322 plusmn 15Extended Yale B 9055 plusmn 06 9366 plusmn 04 9454 plusmn 03PubFig 8740 plusmn 12 9126 plusmn 16 9244 plusmn 11

If ReLU is adopted as activation function (6) is formu-lated as

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171 + 5sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171)

(10)

Table 2 shows the performance of the proposed ADSNTbased on different activation functions conducted on thethree databases FromTable 2 one can see that ReLU achievesthe best performanceThe key reason is that we use theweightdecay term 120593 to optimize the objective function

435 Timing Consumption Analysis In this subsection weuse a HP Z620 workstation with Intel Xeon E5-2609 24GHzCPU 8G RAM and conduct a series of experiments onAR database to compare the time consumption of differentmethods which are tabulated in Table 3 The training time(seconds) is shown in Table 3(a) while the time (seconds)needed to recognize a face from the testing set is shownin Table 3(b) From Table 3 one can see that the proposedmethod requires comparatively more time for trainingbecause of initialization of ADSNT and performing imagereconstruction However the procedure of training is offlineWhen we identity an image from testing set our methodrequires less time than other methods

5 Conclusions

In this article we present an adaptive deep supervisedautoencoder based image reconstruction method for facerecognition Unlike conventional deep autoencoder basedface recognition method our method considers the classlabel information from training samples in the deep learn-ing procedure and can automatically discover the under-lying nonlinear manifold structures Specifically a multi-layer supervised adaptive network structure is presentedwhich is trained to extract characteristic features from cor-ruptedclean facial images and reconstruct the correspondingsimilar facial images The reconstruction is realized by a so-called ldquobottleneckrdquo neural network that learns to map face

Table 3 (a) Training time (seconds) for different methods (b)Testing time (seconds) for different methodsThe proposed methodcosts the least amount of testing time comparing with othermethods

(a)

Methods TimeDAE [12] 861MDAE [17] 1054CAE [15] 786DLN [31] 2343SSAE [9] 5351RICA [32] 1344TDRM [20] 1102Our method 12232

(b)

Methods TimeDAE [12] 027MDAE [17] 03CAE [15] 026DLN [31] 035SSAE [9] 022RICA [32] 019TDRM [20] 018Our method 013

images into a low-dimensional vector and to reconstructthe respective corresponding face images from the mappingvectors Having trained the ADSNT a new face image canthen be recognized by comparing its reconstruction imagewith individual gallery images during testing The proposedmethod has been evaluated on the widely used AR PubFigand Extended Yale B databases and the experimental resultshave shown its effectiveness For future work we are focusingon applying our proposed method to other application fieldssuch as pattern classification based on image set and actionrecognition based on the video to further demonstrate itsvalidity

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is partially supported by the research grant for theNatural Science Foundation from Sichuan Provincial Depart-ment of Education (Grant no 13ZB0336) and the NationalNatural Science Foundation of China (Grant no 61502059)

References

[1] J-T Chien and C-CWu ldquoDiscriminant waveletfaces and near-est feature classifiers for face recognitionrdquo IEEE Transactions on

Mathematical Problems in Engineering 13

Pattern Analysis and Machine Intelligence vol 24 no 12 pp1644ndash1649 2002

[2] Z Lei M Pietikainen and S Z Li ldquoLearning discriminant facedescriptorrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 36 no 2 pp 289ndash302 2014

[3] B Moghaddam T Jebara and A Pentland ldquoBayesian facerecognitionrdquo Pattern Recognition vol 33 no 11 pp 1771ndash17822000

[4] N Cristianini and J S Taylor An Introduction to SupportVector Machines and Other Kernel-based Learning MethodsCambridge University Press New York NY USA 2004

[5] R O Duda P E Hart and D G Stork Pattern ClassificationJohn Wiley amp Sons New York NY USA 2nd edition 2001

[6] W Zhao R Chellappa P J Phillips and A Rosenfeld ldquoFacerecognition a literature surveyrdquo ACM Computing Surveys vol35 no 4 pp 399ndash458 2003

[7] B Zhang S Shan X Chen and W Gao ldquoHistogram of Gaborphase patterns (HGPP) a novel object representation approachfor face recognitionrdquo IEEE Transactions on Image Processingvol 16 no 1 pp 57ndash68 2007

[8] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inNeural Information Processing Systems pp 1527ndash1554 2012

[9] S Gao Y Zhang K Jia J Lu and Y Zhang ldquoSingle sample facerecognition via learning deep supervised autoencodersrdquo IEEETransactions on Information Forensics and Security vol 10 no10 pp 2108ndash2118 2015

[10] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo American Associationfor the Advancement of Science Science vol 313 no 5786 pp504ndash507 2006

[11] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesmrdquo in Neural Networks Tricks ofthe Trade pp 437ndash478 Springer Berlin Germany 2012

[12] P Vincent H Larochelle I Lajoie Y Bengio and P-AManzagol ldquoStacked denoising autoencoders learning usefulrepresentations in a deep network with a local denoisingcriterionrdquo Journal of Machine Learning Research (JMLR) vol 11no 5 pp 3371ndash3408 2010

[13] V Nair and G E Hinton ldquoRectified linear units improveRestricted Boltzmann machinesrdquo in Proceedings of the 27thInternational Conference on Machine Learning (ICML rsquo10) pp807ndash814 Haifa Israel June 2010

[14] A Coates H Lee and A Y Ng ldquoAn analysis of single-layernetworks in unsupervised feature learningrdquo in Proceedings ofthe 14th International Conference on Artificial Intelligence andStatistics (AISTATS rsquo11) pp 215ndash223 Sardinia Italy 2010

[15] S Rifai P Vincent X Muller X Glorot and Y BengioldquoContractive auto-encoders explicit invariance during featureextractionrdquo in Proceedings of the 28th International Conferenceon Machine Learning (ICML rsquo11) pp 833ndash840 Bellevue WashUSA July 2011

[16] K Simonyan A Vedaldi and A Zisserman ldquoDeep fishernetworks for large-scale image classificationrdquo in Proceedings ofthe 27th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo13) pp 163ndash171 Lake Tahoe Nev USA Decem-ber 2013

[17] M Chen Z Xu K Q Weinberger and F Sha ldquoMarginalizeddenoising autoencoders for domain adaptationrdquo in Proceedingsof the 29th International Conference onMachine Learning (ICMLrsquo12) pp 767ndash774 Edinburgh UK July 2012

[18] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquo inProceedings of the 27th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 June 2014

[19] Z Zhu P Luo X Wang and X Tang ldquoDeep learning identity-preserving face spacerdquo in Proceedings of the 14th IEEE Interna-tional Conference on Computer Vision (ICCV rsquo13) pp 113ndash120Sydney Australia December 2013

[20] M Hayat M Bennamoun and S An ldquoDeep reconstructionmodels for image set classificationrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 37 no 4 pp 713ndash727 2015

[21] Y Sun X Wang and X Tang ldquoDeep learning face representa-tion from predicting 10000 classesrdquo in Proceedings of the 27thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo14) pp 1891ndash1898 Columbus Ohio USA June 2014

[22] Y Sun X Wang and X Tang ldquoDeep learning face rep-resentation by joint identification-verificationrdquo Tech Rephttpsarxivorgabs14064773

[23] X Cai CWang B Xiao X Chen and J Zhou ldquoDeep nonlinearmetric learning with independent subspace analysis for faceverificationrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 749ndash752 November2012

[24] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[25] Q V Le J Ngiam A Coates A Lahiri B Prochnow andA Y Ng ldquoOn optimization methods for deep learningrdquo inProceedings of the 28th International Conference on MachineLearning (ICML rsquo11) pp 265ndash272 Bellevue Wash USA July2011

[26] C Zhou X Wei Q Zhang and X Fang ldquoFisherrsquos lineardiscriminant (FLD) and support vectormachine (SVM) in non-negative matrix factorization (NMF) residual space for facerecognitionrdquoOptica Applicata vol 40 no 3 pp 693ndash704 2010

[27] A Martinez and R Benavente ldquoThe AR face databaserdquo CVCTech Rep 24 1998

[28] A S Georghiades P N Belhumeur and D J Kriegman ldquoFromfew to many illumination cone models for face recognitionunder variable lighting and poserdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 23 no 6 pp 643ndash6602001

[29] N Kumar A C Berg P N Belhumeur and S K NayarldquoAttribute and simile classifiers for face verificationrdquo in Proceed-ings of the 12th International Conference on Computer Vision(ICCV rsquo09) pp 365ndash372 Kyoto Japan October 2009

[30] M Rezaei and R Klette ldquoNovel adaptive eye detection andtracking for challenging lighting conditionsrdquo in ComputerVisionmdashACCV 2012 Workshops J-I Park and J Kim Edsvol 7729 of Lecture Notes in Computer Science pp 427ndash440Springer Berlin Germany 2013

[31] Y Tang R Salakhutdinov andGHHinton ldquoDeep Lambertiannetworksrdquo inProceedings of the 29th International Conference onMachine Learning (ICML 2012) pp 1623ndash1630 Edinburgh UKJuly 2012

[32] Q V Le A Karpenko J Ngiam and A Y Ng ldquoICA withreconstruction cost for efficient overcomplete feature learningrdquoin Proceedings of the 25th Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo11) pp 1017ndash1025 GranadaSpain December 2011

14 Mathematical Problems in Engineering

[33] X Glorot A Bordes and Y Bengio ldquoDeep sparse rectifier neu-ral networksrdquo in Proceedings of the 14th International Conferenceon Artificial Intelligence and Statistics pp 315ndash323 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article Adaptive Deep Supervised Autoencoder ...Adaptive Deep Supervised Network Template (ADSNT). the deep network perform well, similar to [], we need to giveitinitializationweights.en,thepreinitializedADSNT

4 Mathematical Problems in Engineering

Similar

f1205791f1205791

g1205799984001

x x x

Similar

middot middot middot

middot middot middot middot middot middot middot middot middot

middot middot middot

(a)

Similar

f3f3

f2f2

f1f1g1

g2

g3

middot middot middot

middot middot middot middot middot middot middot middot middot

middot middot middotmiddot middot middot

middot middot middot middot middot middot middot middot middotx x x

x

Similar

(b)

middot middot middot middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

middot middot middot

JH(x x)

x x

g1205799984001

g1205799984002

g1205799984003

f1205793

f1205792

f1205791

DC

EC

f1205793

f1205792

f1205791

x

Corrupted facex

Clean face

(c)

Figure 3 Architecture of SSAE and ADSNT (a) Supervised autoencoder (SAE) which is comprised of cleanldquocorruptedrdquo datum one hiddenlayer and one reconstruction layer by using the ldquocorrupted datumrdquo (b) stacked supervised autoencoder (SSAE) (c) architecture of theAdaptive Deep Supervised Network Template (ADSNT)

the deep network perform well similar to [20] we need togive it initialization weights Then the preinitialized ADSNTis trained to reconstruct the invariant faces which are insen-sitive to illumination pose and occlusion Finally havingtrained the ADSNT we use the nearest neighbor classifier torecognize a new face image by comparing its reconstructionimage with individual gallery images respectively

31 Adaptive Deep Supervised Network Template (ADSNT)As presented in Figure 3(c) our ADSNT is a deep supervisedautoencoder (DSAE) that consists of two parts an encoder(EC) and a decoder (DC) Each of them has three hiddenlayers and they share the third layer that is the centralhidden layer The features learned from the hidden layer andthe reconstructed clean face are obtained by using the ldquocor-ruptedrdquo data to train the SSAE In the process of pretrainingwe learn a stack of SAE each having only one hidden layerof feature detectors Then the learned activation features ofone SAE are used as ldquodatardquo for training the next SAE in thestack Such training is repeated a number of times until we getthe desired number of layers Although we use the basic SAEstructure which is shown in Figure 3(a) [9] to construct thestacked supervised autoencoder (SSAE) Gao et alrsquos stackedsupervised autoencoder only used two hidden layers and

one reconstruction layer In this paper we use three hiddenlayers to compose the encoder and decoder respectivelywhose structures are shown in Figures 3(b) and 3(c) Theencoder part tries best to seek a compact low-dimensionalmeaningful representation of the cleanldquocorruptedrdquo dataFollowing the work [20] the encoder can be formulated asa combination of several layers which are connected with anonlinear activation function 119906119891(sdot) We can use a sigmoidfunction or a rectified linear unit as nonlinear activation tomap the cleanldquocorruptedrdquo data 119909 to a representation ℎ asfollows

ℎ = 119891 (ℎ2) = 119906119891 (119882(3)119890 ℎ2 + 119887(3)119890 ) ℎ2 = 119891 (ℎ1) = 119906119891 (119882(2)119890 ℎ1 + 119887(2)119890 ) ℎ1 = 119891 (119909) = 119906119891 (119882(1)119890 119909 + 119887(1)119890 ) ℎ = 119891 (ℎ2) = 119906119891 (119882(3)119890 ℎ2 + 119887(3)119890 ) ℎ2 = 119891 (ℎ1) = 119906119891 (119882(2)119890 ℎ1 + 119887(2)119890 ) ℎ1 = 119891 () = 119906119891 (119882(1)119890 + 119887(1)119890 )

(3)

Mathematical Problems in Engineering 5

where119882(119894)119890 isin 119877119889119894minus1times119889119894 is a weight matrix of the encoder for the119894th layer with 119889119894 neurons and 119887(119894)119890 isin 119877119889119894 is the bias vector Theencoder parameters learning are achieved by jointly trainingthe encoder-decoder structure to reconstruct the ldquocorruptrdquodata by minimizing a cost function (see Section 32) There-fore the decoder can be defined as a combination of severallayers integrating a nonlinear activation function 119906119892(sdot)whichreconstructs the ldquocorruptrdquo data from the encoder output ℎThe reconstructed output of the decoder is given by

= 119892 (119909) = 119906119892 (119882(3)119889 119909 + 119887(3)119889 ) 119909 = 119892 (119909) = 119906119892 (119882(2)119889 119909 + 119887(2)119889 ) 119909 = 119892 (ℎ) = 119906119892 (119882(1)119889 ℎ + 119887(1)119889 )

(4)

So we can describe the complete ADSNT by its parameter120579ADSNT = 120579119882 120579119887 where 120579119882 = 119882(119894)119890 119882(119894)119889 and 120579119887 =119887(119894)119890 119887(119894)119889 119894 = 1 2 332 Formulation of Image Reconstruction Based on ADSNTNow we are ready to depict the reconstruction image basedon ADSNT The details are presented as follows

Given a set of 119896 classes training images that includegallery images (called clean data) and probe images (calledldquocorruptedrdquo data) and their corresponding class labels 119910119888 =[1 2 119896] the dataset will be used to train ADSNT forfeature learning Let 119909119894 denote a probe image and 119909119894 (119894 =1 2 119872) present gallery images corresponding to 119909119894 It isdesirable that119909119894 and119909119894 should be similarTherefore followingthe work [9 22] we obtain the following formulation

argmin120579ADSNT

119869 = 1119872 sum119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172

+ 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865) (5)

where 120579ADSNT = 120579119882 120579119887 (see Section 31) are the parametersof ADSNT which is fine-tuned by learning In this paper weonly explore the tied weights that is 119882(3)

119889= 119882(1)119879119890 119882(2)

119889=119882(2)119879119890 and 119882(1)

119889= 119882(3)119879119890 (see Figure 3(c)) 119909119894 is the recon-

struction image of the corrupted image119909119894 Like regularizationparameter 120582120579119882 balances the similarity of the same personto preserve 119891(119909119894) and 119891(119909119894) as similarly as possible 119891(sdot) is anonlinear activation function 120593 is a parameter that balancesweight penalty terms and reconstruction loss sdot 119865 presentsthe Frobenius norm and sum3119895 119882(119894)119890 2119865 + sum3119895 119882(119894)119889 2119865 ensuressmall weight values for all the hidden neurons Furthermorefollowing the work [9 14] we impose a sparsity constrainton the hidden layer to enhance learning meaningful features

Then we can further modify cost function and obtain thefollowing objection formulation

argmin120579ADSNT

119869reg= 119869 + 120574( 3sum

119894

KL (120588119909 || 1205880) + 5sum119894

KL (120588 || 1205880)) (6)

where

120588119909 = 1119872 sum119894

(12119891 (119909119894) + 1) 120588 = 1119872 sum

119894

12 (119891 (119909119894) + 1) KL (120588 || 1205880)

= sum119895

(1205880 log(1205880120588119895) + (1 minus 1205880) log(1 minus 12058801 minus 120588119895))

(7)

Here the KL divergence between two distributions that is 1205880and 120588119895 that present 120588119909 or 120588 is calculated The sparsity 1205880 isusually a constant (taking a small value according to thework[9 24] it is set to 005 in our experiments) whereas 120588119909 and 120588are the mapping mean activation values from clean data andcorrupted data respectively

33 Optimization of ADSNT For obtaining the optimizationparameter 120579ADSNT = 120579119882 120579119887 it is important to initializeweights and select an optimization training algorithm Thetrainingwill fail if the initializationweights are inappropriateThis is to say if we give network too large initializationweights the ADSNTwill be trapped in local minimum If theinitialized weights are too small the ADSNT will encounterthe vanishing gradient problem during backpropagationTherefore following the work [20 24] Gaussian RestrictedBoltzmann Machines (GRBMs) are adopted to initializeweight parameters by performing pretraining which hasbeen already applied widely For more details we referthe reader to the original paper [24] After obtaining theinitialized weights the limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimization algorithm is uti-lized to learn the parameters as it has better performance andfaster convergence than stochastic gradient descent (SGD)and conjugated gradient (CGD) [25] Algorithm 1 depicts theoptimization procedure of ADSNT

Algorithm 1 (learning adaptive deep supervised networktemplate)

Input Training images Ω 119896 classes and each class is com-posed of the face with neutral expression frontal poseand normal illumination condition (clean data) and randomnumber of variant faces (corrupted data) Number of networklayers 119871 Iterative number 119868 balancing parameters 120582 120593 and 120574and convergence error 120576Output Weight parameters 120579ADSNT = 120579119882 120579119887

6 Mathematical Problems in Engineering

(1) Preprocess all images namely perform histogramequalization

(2) 119883 Randomly select a small subset for each individualfromΩ

(3) Initialize Train GRBMs by using 119883 to initialize the120579ADSNT = 120579119882 120579119887(4) (Optimization by L-BFGS)

For 119903 = 1 2 119877 doCalculate 119869reg using (6)

If 119903 gt 1 and |119869119903 minus119869119903minus1| lt 120576 go to ReturnReturn 120579119882 and 120579119887

Since training the ADSNT model aims to reconstructclean data namely gallery images from corrupt data it mightlearn an underlying structure from the corrupt data andproduce very useful representation Furthermore we canlearn an overcomplete sparse representation from corruptdata through mapping them into a high-dimensional featurespace since the first hidden layer has the number of neuronslarger than the dimensionality of original data The high-dimensional model representation is then followed by a so-called ldquobottleneckrdquo that is the data is further mapped to anabstract compact and low-dimensional model representa-tion in the subsequent layers of the encoder Through sucha mapping the redundant information such as illuminationposes and partial occlusion in the corrupted faces is removedandonly the useful information content for us is kept In addi-tion we know that if we use AE with only one hidden layerand jointly linear activation functions the learned weightswould be analogous to a PCA subspace [20] However AEis an unsupervised algorithm In our work we make use ofthe class label information to train SAE so if we also useonly one hidden layer with a linear activation function thelearnedweights by the SAE are thought to be similar to ldquoLDArdquosubspace However in our structure we apply the nonlinearactivation functions and stack several hidden layers togetherand then theADSNTcan adapt to very complicated nonlinearmanifold structures Some of reconstructed images based onADSNT fromAR database are shown in Figure 4(b) One cansee that ADSNT can remove the illumination For those faceimages with partial occlusion ADSNT can also imitate theclean facesThis results are not surprising because the humanbeing has the capability of inferring the unknown faces fromknown face images via the experience (for deep networkstructure the experience learned derives from generic set)[9]34 Face Classification Based on ADSNT Image Reconstruc-tion To better train ADSNT all images need to be pre-processed It is a very important step for object recogni-tion including face recognition The common ways includehistogram equalization geometry normalization and imagesmoothing In this paper for the sake of simplicity we onlyperform histogram equalization on all the facial images tominimize illumination variations That is we utilize his-togram equalization to normalize the histogram of facial

images and make them more compact For the details abouthistogram equalization one can be referred to see [26]

After the ADSNT is trained completely with a certainnumber of individuals we can use it to performon the unseenface images for recognizing them

Given a test facial image 119909(119905) which is also preprocessedwith histogram equalization in the same way as the trainingimages and presented to the ADSNTnetwork we reconstruct(using (3) and (4)) image 119909(119905) from ADSNT which is similarto clean face For the sake of simplicity the nearest neighborclassification based on the Euclidean distance between thereconstruction and all the gallery images identifies the classThe classification formula is defined as

119868119896 (119909(119905)) = argmin119892

1003817100381710038171003817100381710038171003817119909(119905) minus 1199091198921003817100381710038171003817100381710038171003817 forall119892 isin 1 2 119888 (8)

where 119868119896(119909(119905)) is the resulting identity and119909119892 is the clean facialimage in the gallery images of individual 1198924 Experimental Results and Discussion

In this section extensive experiments are conducted topresent and compare the performance of different methodswith the proposed approach The experiments are imple-mented on three widely used face databases that is AR[27] Extended Yale B [28] and PubFig [29] The details ofthese three databases and performance evaluation of differentapproaches are presented as follows

41 Dataset Description TheARdatabase contains over 4000color face images from 126 people (56 women and 70 men)The images were taken in two sessions (between two weeks)and each session contained 13 pictures from one personThese images contain frontal view faces with different facialexpression illuminations and occlusions (sun glasses andscarf) Some sample face images from AR are illustratedin Figure 5(a) In our experiments for each person wechoose the facial imageswith neutral expression frontal poseand normal illumination condition as gallery images andrandomly select half the number of images from the rest ofthe images of each person as probe images The remainingimages compose the testing set

The Extended Yale B database consists of 16128 imagesof 38 people under 64 illumination conditions and 9 posesSome sample face images fromExtendedYale B are illustratedin Figure 5(b) For each person we select the faces that havenormal light condition and frontal pose as gallery images andrandomly choose 6 poses and 16 illumination face images tocompose the probe images The remaining images composethe testing set

The PubFig database is composed of 58797 images of200 subjects taken from the internet The images of thedatabase were taken in completely uncontrolled conditionswith noncooperative people These images have a very largedegree of variability in face expression pose illuminationand so forth Some sample images fromPubFig are illustratedin Figure 5(c) In our experiments for each individualwe select the faces with neutral expression the frontal or

Mathematical Problems in Engineering 7

(a)

(b)

Figure 4 Some original images fromARdatabase and the reconstructed ones (a) Original face images (corrupted faces) (b) Reconstructedfaces near frontal pose and normal illumination as galleries andrandomly choose half the number of images from the rest ofthe images of each person as probes The remaining imagescompose the testing set42 Experimental Settings In all the experiments the facialimages from the AR PubFig and Extended Yale B databasesare automatically detected using OpenCV face detector [30]After that we normalize the detected facial images (inorientation and scale) such that two eyes can be alignedat the same location Then the face areas are cropped andconverted to 256 gray levels images The size of each croppedimage is 26 times 30 pixels Thus the dimensionality of the inputvector is 780 Figure 6 presents an example fromAR databaseand the corresponding cropped image Each cropped facialimage is further preprocessed with histogram equalizationto minimize illumination variations We train our ADSNTmodel with 3 hidden layers where the number of hidden

nodes for these layers is empirically set as [1024 rarr 500 rarr120] because our experiments show that three hidden layerscan get a sufficiently good performance (see Section 433)

In order to show the whole experimental process aboutparameters setting we initially use the hyperbolic tangentfunction as the nonlinear activation function and implementADSNT on AR We also choose the face images with neutralexpression frontal pose and normal illumination as galleriesand randomly select half the number of images from the restof the images of each person as probe images The remainingimages compose the testing setThemean identification ratesare recorded

Firstly we empirically set the parameter 120576 = 0001 andsparsity target 1205880 = 005 and fix the parameters 120582 = 05and 120593 = 01 in ADSNT to check the effect of 120574 on theidentification rate As illustrated in Figure 6(a) where 120574 =008 ADSNT recognition method gets the best performance

8 Mathematical Problems in Engineering

(a)

(b)

(c)

Figure 5 A fraction of samples from AR PubFig and Extended Yale B face databases (a) AR (b) Extended Yale B and (c) PubFig

Then according to Figure 6(a) we fix the parameters 120574 = 008and 120582 = 05 in ADSNT to check the influence of 120593 Asshowed in Figure 6(b) when 120593 = 06 our method achievesthe best recognition rate At last we fix 120574 = 008 and 120593 = 06and the recognition rates are illustrated in Figure 6(c) withdifferent value of 120582 When 120582 = 3 the recognition rate is thehighest From the plot in Figure 6 one can observe that theparameters 120582 120593 and 120574 cannot be too large or too small If120582 is too large the ADSNT would be less discriminative ofdifferent subjects because it implements too strong similaritypreservation entry But if 120582 is too small it will degrade therecognition performance and the significance of similaritypreservation entry Similarly 120574 can also not be too large orthe hidden neurons will not be activated for a given input

and low recognition rate will be achieved If 120574 is too smallwe can get poor performance For the weight decay 120593 if itis too small the values of weights for all hidden units willchange very slightly On the contrary the values of weightswill change greatly

Using above those experiments we gain the optimalparameter values used in ADSNT as 120582 = 3 120593 = 06 and120574 = 008 on AR database The similar experiments also havebeen performed on Extended Yale B and PubFig databasesWe can get the parameters setting as 120582 = 26 120593 = 05 and120574 = 006 on Extended Yale B database and 120582 = 28 120593 = 052and 120574 = 009 on PubFig database

In the experiments we use two measures includingthe mean identification accuracy 120583 with standard deviation

Mathematical Problems in Engineering 9

Iden

tifica

tion

rate

()

120582 = 05120593 = 01

76

80

84

88

92

01

011

005

006

007

008

009

012

013

014

000

5

000

1

000

01

The parameter 120574

(a)

Iden

tifica

tion

rate

()

95

90

85

80

75

120582 = 05

1 2

08

07

010

02

03

04

05

06

09

22

004

002

008

000

9

000

6

The parameter 120593

120574 = 008

(b)

Iden

tifica

tion

rate

()

The parameter 120582

120593 = 06120574 = 008

80

85

90

55

50

45

40

35

3025

20

15

10

05

00

60

(c)

Figure 6 Parameters setting

](120583 plusmn ]) and the receiving operating characteristic (ROC)curves to validate the effectiveness of our method as well asother methods

43 Experimental Results and Analysis

431 Comparison with Different Methods In the followingexperiments on the three databases we compare the pro-posed approach with several recently proposed methodsThese compared methods include DAE with 10 randommask noises [12] marginalized DAE (MDAE) [17] Constrac-tive Autoencoders (CAE) [15] Deep Lambertian Networks(DLN) [31] stacked supervised autoencoder (SSAE) [9] ICA-Reconstruction (RICA) [32] and Template Deep Recon-struction Model (TDRM) [20] We use the implementationof these algorithms that are provided by the respectiveauthors For all the compared approaches we use the defaultparameters that are recommended in the correspondingpapers

The mean identification accuracy with standard devia-tions of different approaches on three databases is shownin Table 1 The ROC curves of different approaches areillustrated in Figure 7 The results imply that our approach

Table 1 Comparisons of the average identification accuracyand standard deviation () of different approaches on differentdatabases

Method AR Extended Yale B PubFigDAE [12] 5756 plusmn 02 6345 plusmn 13 6133 plusmn 15MDAE [17] 6780 plusmn 13 7156 plusmn 16 7055 plusmn 25CAE [15] 4950 plusmn 21 5572 plusmn 08 6856 plusmn 16DLN [31] NA 8150 plusmn 14 7760 plusmn 14SSAE [9] 8521 plusmn 07 8222 plusmn 03 8404 plusmn 12RICA [32] 7633 plusmn 17 7044 plusmn 13 7235 plusmn 15TDRM [20] 8770 plusmn 06 8642 plusmn 12 8990 plusmn 09Our method 9232 plusmn 07 9366 plusmn 04 9126 plusmn 16significantly outperforms other methods and gets the bestmean recognition rates for the same setting of trainingand testing sets Compared to those unsupervised deeplearning methods such as DAE MDAE CAE DLN andTDRM the improvement of our method is over 30 onExtended Yale B and AR databases where there is a littlepose variance On the PubFig database our approach canalso achieve the mean identification rate of 9126 plusmn 16

10 Mathematical Problems in Engineering

True

pos

itive

rate

04

05

06

07

08

09

10

02 03 04 05 06 07 08 09 1001

False positive rate

DAEMDAECAEDLN

SSAERICATDRMOur method

(a)

01 02 03 04 05 06 07 08 09 1000

False positive rate

07

08

09

10

True

pos

itive

rate

DAEMDAECAESSAE

RICATDRMOur method

(b)

050

055

060

065

070

075

080

085

090

095

100

True

pos

itive

rate

02 03 04 05 06 07 08 09 1001

False positive rate

DAEMDAECAEDLN

SSAERICATDRMOur method

(c)

Figure 7 Comparisons of ROC curves between our method and other methods on different databases (a) AR (b) Extended Yale B and (c)PubFig

and outperforms all compared methods The reason is thatourmethod can extract discriminative robust information tovariances (expression illumination pose etc) in the learneddeep networks Compared with a supervised method likeRICA the proposed method can improve over 16 19and 23 on AR PubFig and Extended Yale B databasesrespectively Our method is a deep learning method whichfocuses on the nonlinear classification problem with learninga nonlinear mapping such that more nonlinear discriminantinformation may be explored to enhance the identificationperformance Compared with SSAE method that is designedfor removing the variances such as illumination pose andpartial occlusion our method can still be better over 6

because of using the weight penalty terms GRBM to initializeweights and three layersrsquo similarity preservation term

432 Convergence Analysis In this subsection we evaluatedthe convergence of our ADSNT versus a different numberof iterations Figure 8 illustrates the value of the objectivefunction of ADSNT versus a different number of iterationson the AR PubFig and Extended Yale B databases FromFigure 8(a) one can observe that ADSNT converges in about55 28 and 70 iterations on the three databases respectively

We also implement the identification accuracy of ADSNTversus a different number of iterations on the AR PubFigand Extended Yale B databases Figure 8(b) plots the mean

Mathematical Problems in Engineering 11

Obj

ectiv

e fun

ctio

n va

lue

4

3

2

1

0

Iteration number0 30 60 90 120 150

ARExtended Yale BPubFig

(a)

ARExtended Yale BPubFig

Mea

n id

entifi

catio

n ra

te (

)

100

80

60

40

Iteration number20 40 60 80

(b)

Figure 8 Convergence analysis (a) Convergence curves of ADSNT on AR PubFig and Extended Yale B (b) Mean identification rate ()versus iterations of ADSNT on AR PubFig and Extended Yale B

Iden

tifica

tion

accu

racy

()

AR Extended Yale B PubFig0

10

20

30

40

50

60

70

80

90

100

Layer 1

Layer 2

Layer 3

Layer 4

Figure 9 The results of ADSNT with different network depth onthe different datasets

identification rate of ADSNT From Figure 8(b) one can alsoobserve that ADSNT achieves stable performance after about55 70 and 28 iterations on AR PubFig and Extended Yale Bdatabases respectively

433 The Effect of Network Depth In this subsection weconduct experiments on the three face datasets with differenthidden layer of our proposedADSNTnetworkThe proposedmethod achieves an identification rate of 923 plusmn 06 933 plusmn12 and 9122 plusmn 08 by three-hidden layer ADSNTnetwork that is 1024 rarr 500 rarr 120 respectively on ARExtended Yale B and PubFig datasets Figure 9 illustrates the

performance of different layer ADSNT One can observe thatthree-hidden layer network outperforms 2-layer networkand the result of 3-layer ADSNT network is very nearly equalto those of 4-layer network on AR and Extended Yale Bdatabases We also observe that the performance of 4-layernetwork is a bit lower than that of 3-layer network on thePubFig database In addition the deeper ADSNT networkis the more complex its computational complexity becomesTherefore the 3-layer network depth is a good trade-offbetween performance and computational complexity

434 Activation Function Following the work in [9] we alsoestimate the performance of ADSNTwith different activationfunctions such as sigmoid hyperbolic tangent and rectifiedlinear unit (ReLU) [33] which is defined as 119891(119909) = max(0 119909)When the sigmoid 119891(119909) = 1(1 + 119890minus119909) is used as activationfunction the objective function (see (6)) is rewritten asfollows

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

KL (120588119909 || 1205880) + 5sum119894

KL (120588 || 1205880))

(9)

where 120588119909 = (1119872)sum119894 119891(119909119894) 120588 = (1119872)sum119894 119891(119909119894)

12 Mathematical Problems in Engineering

Table 2 Comparisons of the ADSNT algorithm with differentactivation functions on the AR PubFig and Extended Yale Bdatabases

Dataset Sigmoid Tanh ReLUAR 8866 plusmn 14 9232 plusmn 07 9322 plusmn 15Extended Yale B 9055 plusmn 06 9366 plusmn 04 9454 plusmn 03PubFig 8740 plusmn 12 9126 plusmn 16 9244 plusmn 11

If ReLU is adopted as activation function (6) is formu-lated as

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171 + 5sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171)

(10)

Table 2 shows the performance of the proposed ADSNTbased on different activation functions conducted on thethree databases FromTable 2 one can see that ReLU achievesthe best performanceThe key reason is that we use theweightdecay term 120593 to optimize the objective function

435 Timing Consumption Analysis In this subsection weuse a HP Z620 workstation with Intel Xeon E5-2609 24GHzCPU 8G RAM and conduct a series of experiments onAR database to compare the time consumption of differentmethods which are tabulated in Table 3 The training time(seconds) is shown in Table 3(a) while the time (seconds)needed to recognize a face from the testing set is shownin Table 3(b) From Table 3 one can see that the proposedmethod requires comparatively more time for trainingbecause of initialization of ADSNT and performing imagereconstruction However the procedure of training is offlineWhen we identity an image from testing set our methodrequires less time than other methods

5 Conclusions

In this article we present an adaptive deep supervisedautoencoder based image reconstruction method for facerecognition Unlike conventional deep autoencoder basedface recognition method our method considers the classlabel information from training samples in the deep learn-ing procedure and can automatically discover the under-lying nonlinear manifold structures Specifically a multi-layer supervised adaptive network structure is presentedwhich is trained to extract characteristic features from cor-ruptedclean facial images and reconstruct the correspondingsimilar facial images The reconstruction is realized by a so-called ldquobottleneckrdquo neural network that learns to map face

Table 3 (a) Training time (seconds) for different methods (b)Testing time (seconds) for different methodsThe proposed methodcosts the least amount of testing time comparing with othermethods

(a)

Methods TimeDAE [12] 861MDAE [17] 1054CAE [15] 786DLN [31] 2343SSAE [9] 5351RICA [32] 1344TDRM [20] 1102Our method 12232

(b)

Methods TimeDAE [12] 027MDAE [17] 03CAE [15] 026DLN [31] 035SSAE [9] 022RICA [32] 019TDRM [20] 018Our method 013

images into a low-dimensional vector and to reconstructthe respective corresponding face images from the mappingvectors Having trained the ADSNT a new face image canthen be recognized by comparing its reconstruction imagewith individual gallery images during testing The proposedmethod has been evaluated on the widely used AR PubFigand Extended Yale B databases and the experimental resultshave shown its effectiveness For future work we are focusingon applying our proposed method to other application fieldssuch as pattern classification based on image set and actionrecognition based on the video to further demonstrate itsvalidity

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is partially supported by the research grant for theNatural Science Foundation from Sichuan Provincial Depart-ment of Education (Grant no 13ZB0336) and the NationalNatural Science Foundation of China (Grant no 61502059)

References

[1] J-T Chien and C-CWu ldquoDiscriminant waveletfaces and near-est feature classifiers for face recognitionrdquo IEEE Transactions on

Mathematical Problems in Engineering 13

Pattern Analysis and Machine Intelligence vol 24 no 12 pp1644ndash1649 2002

[2] Z Lei M Pietikainen and S Z Li ldquoLearning discriminant facedescriptorrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 36 no 2 pp 289ndash302 2014

[3] B Moghaddam T Jebara and A Pentland ldquoBayesian facerecognitionrdquo Pattern Recognition vol 33 no 11 pp 1771ndash17822000

[4] N Cristianini and J S Taylor An Introduction to SupportVector Machines and Other Kernel-based Learning MethodsCambridge University Press New York NY USA 2004

[5] R O Duda P E Hart and D G Stork Pattern ClassificationJohn Wiley amp Sons New York NY USA 2nd edition 2001

[6] W Zhao R Chellappa P J Phillips and A Rosenfeld ldquoFacerecognition a literature surveyrdquo ACM Computing Surveys vol35 no 4 pp 399ndash458 2003

[7] B Zhang S Shan X Chen and W Gao ldquoHistogram of Gaborphase patterns (HGPP) a novel object representation approachfor face recognitionrdquo IEEE Transactions on Image Processingvol 16 no 1 pp 57ndash68 2007

[8] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inNeural Information Processing Systems pp 1527ndash1554 2012

[9] S Gao Y Zhang K Jia J Lu and Y Zhang ldquoSingle sample facerecognition via learning deep supervised autoencodersrdquo IEEETransactions on Information Forensics and Security vol 10 no10 pp 2108ndash2118 2015

[10] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo American Associationfor the Advancement of Science Science vol 313 no 5786 pp504ndash507 2006

[11] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesmrdquo in Neural Networks Tricks ofthe Trade pp 437ndash478 Springer Berlin Germany 2012

[12] P Vincent H Larochelle I Lajoie Y Bengio and P-AManzagol ldquoStacked denoising autoencoders learning usefulrepresentations in a deep network with a local denoisingcriterionrdquo Journal of Machine Learning Research (JMLR) vol 11no 5 pp 3371ndash3408 2010

[13] V Nair and G E Hinton ldquoRectified linear units improveRestricted Boltzmann machinesrdquo in Proceedings of the 27thInternational Conference on Machine Learning (ICML rsquo10) pp807ndash814 Haifa Israel June 2010

[14] A Coates H Lee and A Y Ng ldquoAn analysis of single-layernetworks in unsupervised feature learningrdquo in Proceedings ofthe 14th International Conference on Artificial Intelligence andStatistics (AISTATS rsquo11) pp 215ndash223 Sardinia Italy 2010

[15] S Rifai P Vincent X Muller X Glorot and Y BengioldquoContractive auto-encoders explicit invariance during featureextractionrdquo in Proceedings of the 28th International Conferenceon Machine Learning (ICML rsquo11) pp 833ndash840 Bellevue WashUSA July 2011

[16] K Simonyan A Vedaldi and A Zisserman ldquoDeep fishernetworks for large-scale image classificationrdquo in Proceedings ofthe 27th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo13) pp 163ndash171 Lake Tahoe Nev USA Decem-ber 2013

[17] M Chen Z Xu K Q Weinberger and F Sha ldquoMarginalizeddenoising autoencoders for domain adaptationrdquo in Proceedingsof the 29th International Conference onMachine Learning (ICMLrsquo12) pp 767ndash774 Edinburgh UK July 2012

[18] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquo inProceedings of the 27th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 June 2014

[19] Z Zhu P Luo X Wang and X Tang ldquoDeep learning identity-preserving face spacerdquo in Proceedings of the 14th IEEE Interna-tional Conference on Computer Vision (ICCV rsquo13) pp 113ndash120Sydney Australia December 2013

[20] M Hayat M Bennamoun and S An ldquoDeep reconstructionmodels for image set classificationrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 37 no 4 pp 713ndash727 2015

[21] Y Sun X Wang and X Tang ldquoDeep learning face representa-tion from predicting 10000 classesrdquo in Proceedings of the 27thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo14) pp 1891ndash1898 Columbus Ohio USA June 2014

[22] Y Sun X Wang and X Tang ldquoDeep learning face rep-resentation by joint identification-verificationrdquo Tech Rephttpsarxivorgabs14064773

[23] X Cai CWang B Xiao X Chen and J Zhou ldquoDeep nonlinearmetric learning with independent subspace analysis for faceverificationrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 749ndash752 November2012

[24] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[25] Q V Le J Ngiam A Coates A Lahiri B Prochnow andA Y Ng ldquoOn optimization methods for deep learningrdquo inProceedings of the 28th International Conference on MachineLearning (ICML rsquo11) pp 265ndash272 Bellevue Wash USA July2011

[26] C Zhou X Wei Q Zhang and X Fang ldquoFisherrsquos lineardiscriminant (FLD) and support vectormachine (SVM) in non-negative matrix factorization (NMF) residual space for facerecognitionrdquoOptica Applicata vol 40 no 3 pp 693ndash704 2010

[27] A Martinez and R Benavente ldquoThe AR face databaserdquo CVCTech Rep 24 1998

[28] A S Georghiades P N Belhumeur and D J Kriegman ldquoFromfew to many illumination cone models for face recognitionunder variable lighting and poserdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 23 no 6 pp 643ndash6602001

[29] N Kumar A C Berg P N Belhumeur and S K NayarldquoAttribute and simile classifiers for face verificationrdquo in Proceed-ings of the 12th International Conference on Computer Vision(ICCV rsquo09) pp 365ndash372 Kyoto Japan October 2009

[30] M Rezaei and R Klette ldquoNovel adaptive eye detection andtracking for challenging lighting conditionsrdquo in ComputerVisionmdashACCV 2012 Workshops J-I Park and J Kim Edsvol 7729 of Lecture Notes in Computer Science pp 427ndash440Springer Berlin Germany 2013

[31] Y Tang R Salakhutdinov andGHHinton ldquoDeep Lambertiannetworksrdquo inProceedings of the 29th International Conference onMachine Learning (ICML 2012) pp 1623ndash1630 Edinburgh UKJuly 2012

[32] Q V Le A Karpenko J Ngiam and A Y Ng ldquoICA withreconstruction cost for efficient overcomplete feature learningrdquoin Proceedings of the 25th Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo11) pp 1017ndash1025 GranadaSpain December 2011

14 Mathematical Problems in Engineering

[33] X Glorot A Bordes and Y Bengio ldquoDeep sparse rectifier neu-ral networksrdquo in Proceedings of the 14th International Conferenceon Artificial Intelligence and Statistics pp 315ndash323 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article Adaptive Deep Supervised Autoencoder ...Adaptive Deep Supervised Network Template (ADSNT). the deep network perform well, similar to [], we need to giveitinitializationweights.en,thepreinitializedADSNT

Mathematical Problems in Engineering 5

where119882(119894)119890 isin 119877119889119894minus1times119889119894 is a weight matrix of the encoder for the119894th layer with 119889119894 neurons and 119887(119894)119890 isin 119877119889119894 is the bias vector Theencoder parameters learning are achieved by jointly trainingthe encoder-decoder structure to reconstruct the ldquocorruptrdquodata by minimizing a cost function (see Section 32) There-fore the decoder can be defined as a combination of severallayers integrating a nonlinear activation function 119906119892(sdot)whichreconstructs the ldquocorruptrdquo data from the encoder output ℎThe reconstructed output of the decoder is given by

= 119892 (119909) = 119906119892 (119882(3)119889 119909 + 119887(3)119889 ) 119909 = 119892 (119909) = 119906119892 (119882(2)119889 119909 + 119887(2)119889 ) 119909 = 119892 (ℎ) = 119906119892 (119882(1)119889 ℎ + 119887(1)119889 )

(4)

So we can describe the complete ADSNT by its parameter120579ADSNT = 120579119882 120579119887 where 120579119882 = 119882(119894)119890 119882(119894)119889 and 120579119887 =119887(119894)119890 119887(119894)119889 119894 = 1 2 332 Formulation of Image Reconstruction Based on ADSNTNow we are ready to depict the reconstruction image basedon ADSNT The details are presented as follows

Given a set of 119896 classes training images that includegallery images (called clean data) and probe images (calledldquocorruptedrdquo data) and their corresponding class labels 119910119888 =[1 2 119896] the dataset will be used to train ADSNT forfeature learning Let 119909119894 denote a probe image and 119909119894 (119894 =1 2 119872) present gallery images corresponding to 119909119894 It isdesirable that119909119894 and119909119894 should be similarTherefore followingthe work [9 22] we obtain the following formulation

argmin120579ADSNT

119869 = 1119872 sum119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172

+ 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865) (5)

where 120579ADSNT = 120579119882 120579119887 (see Section 31) are the parametersof ADSNT which is fine-tuned by learning In this paper weonly explore the tied weights that is 119882(3)

119889= 119882(1)119879119890 119882(2)

119889=119882(2)119879119890 and 119882(1)

119889= 119882(3)119879119890 (see Figure 3(c)) 119909119894 is the recon-

struction image of the corrupted image119909119894 Like regularizationparameter 120582120579119882 balances the similarity of the same personto preserve 119891(119909119894) and 119891(119909119894) as similarly as possible 119891(sdot) is anonlinear activation function 120593 is a parameter that balancesweight penalty terms and reconstruction loss sdot 119865 presentsthe Frobenius norm and sum3119895 119882(119894)119890 2119865 + sum3119895 119882(119894)119889 2119865 ensuressmall weight values for all the hidden neurons Furthermorefollowing the work [9 14] we impose a sparsity constrainton the hidden layer to enhance learning meaningful features

Then we can further modify cost function and obtain thefollowing objection formulation

argmin120579ADSNT

119869reg= 119869 + 120574( 3sum

119894

KL (120588119909 || 1205880) + 5sum119894

KL (120588 || 1205880)) (6)

where

120588119909 = 1119872 sum119894

(12119891 (119909119894) + 1) 120588 = 1119872 sum

119894

12 (119891 (119909119894) + 1) KL (120588 || 1205880)

= sum119895

(1205880 log(1205880120588119895) + (1 minus 1205880) log(1 minus 12058801 minus 120588119895))

(7)

Here the KL divergence between two distributions that is 1205880and 120588119895 that present 120588119909 or 120588 is calculated The sparsity 1205880 isusually a constant (taking a small value according to thework[9 24] it is set to 005 in our experiments) whereas 120588119909 and 120588are the mapping mean activation values from clean data andcorrupted data respectively

33 Optimization of ADSNT For obtaining the optimizationparameter 120579ADSNT = 120579119882 120579119887 it is important to initializeweights and select an optimization training algorithm Thetrainingwill fail if the initializationweights are inappropriateThis is to say if we give network too large initializationweights the ADSNTwill be trapped in local minimum If theinitialized weights are too small the ADSNT will encounterthe vanishing gradient problem during backpropagationTherefore following the work [20 24] Gaussian RestrictedBoltzmann Machines (GRBMs) are adopted to initializeweight parameters by performing pretraining which hasbeen already applied widely For more details we referthe reader to the original paper [24] After obtaining theinitialized weights the limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimization algorithm is uti-lized to learn the parameters as it has better performance andfaster convergence than stochastic gradient descent (SGD)and conjugated gradient (CGD) [25] Algorithm 1 depicts theoptimization procedure of ADSNT

Algorithm 1 (learning adaptive deep supervised networktemplate)

Input Training images Ω 119896 classes and each class is com-posed of the face with neutral expression frontal poseand normal illumination condition (clean data) and randomnumber of variant faces (corrupted data) Number of networklayers 119871 Iterative number 119868 balancing parameters 120582 120593 and 120574and convergence error 120576Output Weight parameters 120579ADSNT = 120579119882 120579119887

6 Mathematical Problems in Engineering

(1) Preprocess all images namely perform histogramequalization

(2) 119883 Randomly select a small subset for each individualfromΩ

(3) Initialize Train GRBMs by using 119883 to initialize the120579ADSNT = 120579119882 120579119887(4) (Optimization by L-BFGS)

For 119903 = 1 2 119877 doCalculate 119869reg using (6)

If 119903 gt 1 and |119869119903 minus119869119903minus1| lt 120576 go to ReturnReturn 120579119882 and 120579119887

Since training the ADSNT model aims to reconstructclean data namely gallery images from corrupt data it mightlearn an underlying structure from the corrupt data andproduce very useful representation Furthermore we canlearn an overcomplete sparse representation from corruptdata through mapping them into a high-dimensional featurespace since the first hidden layer has the number of neuronslarger than the dimensionality of original data The high-dimensional model representation is then followed by a so-called ldquobottleneckrdquo that is the data is further mapped to anabstract compact and low-dimensional model representa-tion in the subsequent layers of the encoder Through sucha mapping the redundant information such as illuminationposes and partial occlusion in the corrupted faces is removedandonly the useful information content for us is kept In addi-tion we know that if we use AE with only one hidden layerand jointly linear activation functions the learned weightswould be analogous to a PCA subspace [20] However AEis an unsupervised algorithm In our work we make use ofthe class label information to train SAE so if we also useonly one hidden layer with a linear activation function thelearnedweights by the SAE are thought to be similar to ldquoLDArdquosubspace However in our structure we apply the nonlinearactivation functions and stack several hidden layers togetherand then theADSNTcan adapt to very complicated nonlinearmanifold structures Some of reconstructed images based onADSNT fromAR database are shown in Figure 4(b) One cansee that ADSNT can remove the illumination For those faceimages with partial occlusion ADSNT can also imitate theclean facesThis results are not surprising because the humanbeing has the capability of inferring the unknown faces fromknown face images via the experience (for deep networkstructure the experience learned derives from generic set)[9]34 Face Classification Based on ADSNT Image Reconstruc-tion To better train ADSNT all images need to be pre-processed It is a very important step for object recogni-tion including face recognition The common ways includehistogram equalization geometry normalization and imagesmoothing In this paper for the sake of simplicity we onlyperform histogram equalization on all the facial images tominimize illumination variations That is we utilize his-togram equalization to normalize the histogram of facial

images and make them more compact For the details abouthistogram equalization one can be referred to see [26]

After the ADSNT is trained completely with a certainnumber of individuals we can use it to performon the unseenface images for recognizing them

Given a test facial image 119909(119905) which is also preprocessedwith histogram equalization in the same way as the trainingimages and presented to the ADSNTnetwork we reconstruct(using (3) and (4)) image 119909(119905) from ADSNT which is similarto clean face For the sake of simplicity the nearest neighborclassification based on the Euclidean distance between thereconstruction and all the gallery images identifies the classThe classification formula is defined as

119868119896 (119909(119905)) = argmin119892

1003817100381710038171003817100381710038171003817119909(119905) minus 1199091198921003817100381710038171003817100381710038171003817 forall119892 isin 1 2 119888 (8)

where 119868119896(119909(119905)) is the resulting identity and119909119892 is the clean facialimage in the gallery images of individual 1198924 Experimental Results and Discussion

In this section extensive experiments are conducted topresent and compare the performance of different methodswith the proposed approach The experiments are imple-mented on three widely used face databases that is AR[27] Extended Yale B [28] and PubFig [29] The details ofthese three databases and performance evaluation of differentapproaches are presented as follows

41 Dataset Description TheARdatabase contains over 4000color face images from 126 people (56 women and 70 men)The images were taken in two sessions (between two weeks)and each session contained 13 pictures from one personThese images contain frontal view faces with different facialexpression illuminations and occlusions (sun glasses andscarf) Some sample face images from AR are illustratedin Figure 5(a) In our experiments for each person wechoose the facial imageswith neutral expression frontal poseand normal illumination condition as gallery images andrandomly select half the number of images from the rest ofthe images of each person as probe images The remainingimages compose the testing set

The Extended Yale B database consists of 16128 imagesof 38 people under 64 illumination conditions and 9 posesSome sample face images fromExtendedYale B are illustratedin Figure 5(b) For each person we select the faces that havenormal light condition and frontal pose as gallery images andrandomly choose 6 poses and 16 illumination face images tocompose the probe images The remaining images composethe testing set

The PubFig database is composed of 58797 images of200 subjects taken from the internet The images of thedatabase were taken in completely uncontrolled conditionswith noncooperative people These images have a very largedegree of variability in face expression pose illuminationand so forth Some sample images fromPubFig are illustratedin Figure 5(c) In our experiments for each individualwe select the faces with neutral expression the frontal or

Mathematical Problems in Engineering 7

(a)

(b)

Figure 4 Some original images fromARdatabase and the reconstructed ones (a) Original face images (corrupted faces) (b) Reconstructedfaces near frontal pose and normal illumination as galleries andrandomly choose half the number of images from the rest ofthe images of each person as probes The remaining imagescompose the testing set42 Experimental Settings In all the experiments the facialimages from the AR PubFig and Extended Yale B databasesare automatically detected using OpenCV face detector [30]After that we normalize the detected facial images (inorientation and scale) such that two eyes can be alignedat the same location Then the face areas are cropped andconverted to 256 gray levels images The size of each croppedimage is 26 times 30 pixels Thus the dimensionality of the inputvector is 780 Figure 6 presents an example fromAR databaseand the corresponding cropped image Each cropped facialimage is further preprocessed with histogram equalizationto minimize illumination variations We train our ADSNTmodel with 3 hidden layers where the number of hidden

nodes for these layers is empirically set as [1024 rarr 500 rarr120] because our experiments show that three hidden layerscan get a sufficiently good performance (see Section 433)

In order to show the whole experimental process aboutparameters setting we initially use the hyperbolic tangentfunction as the nonlinear activation function and implementADSNT on AR We also choose the face images with neutralexpression frontal pose and normal illumination as galleriesand randomly select half the number of images from the restof the images of each person as probe images The remainingimages compose the testing setThemean identification ratesare recorded

Firstly we empirically set the parameter 120576 = 0001 andsparsity target 1205880 = 005 and fix the parameters 120582 = 05and 120593 = 01 in ADSNT to check the effect of 120574 on theidentification rate As illustrated in Figure 6(a) where 120574 =008 ADSNT recognition method gets the best performance

8 Mathematical Problems in Engineering

(a)

(b)

(c)

Figure 5 A fraction of samples from AR PubFig and Extended Yale B face databases (a) AR (b) Extended Yale B and (c) PubFig

Then according to Figure 6(a) we fix the parameters 120574 = 008and 120582 = 05 in ADSNT to check the influence of 120593 Asshowed in Figure 6(b) when 120593 = 06 our method achievesthe best recognition rate At last we fix 120574 = 008 and 120593 = 06and the recognition rates are illustrated in Figure 6(c) withdifferent value of 120582 When 120582 = 3 the recognition rate is thehighest From the plot in Figure 6 one can observe that theparameters 120582 120593 and 120574 cannot be too large or too small If120582 is too large the ADSNT would be less discriminative ofdifferent subjects because it implements too strong similaritypreservation entry But if 120582 is too small it will degrade therecognition performance and the significance of similaritypreservation entry Similarly 120574 can also not be too large orthe hidden neurons will not be activated for a given input

and low recognition rate will be achieved If 120574 is too smallwe can get poor performance For the weight decay 120593 if itis too small the values of weights for all hidden units willchange very slightly On the contrary the values of weightswill change greatly

Using above those experiments we gain the optimalparameter values used in ADSNT as 120582 = 3 120593 = 06 and120574 = 008 on AR database The similar experiments also havebeen performed on Extended Yale B and PubFig databasesWe can get the parameters setting as 120582 = 26 120593 = 05 and120574 = 006 on Extended Yale B database and 120582 = 28 120593 = 052and 120574 = 009 on PubFig database

In the experiments we use two measures includingthe mean identification accuracy 120583 with standard deviation

Mathematical Problems in Engineering 9

Iden

tifica

tion

rate

()

120582 = 05120593 = 01

76

80

84

88

92

01

011

005

006

007

008

009

012

013

014

000

5

000

1

000

01

The parameter 120574

(a)

Iden

tifica

tion

rate

()

95

90

85

80

75

120582 = 05

1 2

08

07

010

02

03

04

05

06

09

22

004

002

008

000

9

000

6

The parameter 120593

120574 = 008

(b)

Iden

tifica

tion

rate

()

The parameter 120582

120593 = 06120574 = 008

80

85

90

55

50

45

40

35

3025

20

15

10

05

00

60

(c)

Figure 6 Parameters setting

](120583 plusmn ]) and the receiving operating characteristic (ROC)curves to validate the effectiveness of our method as well asother methods

43 Experimental Results and Analysis

431 Comparison with Different Methods In the followingexperiments on the three databases we compare the pro-posed approach with several recently proposed methodsThese compared methods include DAE with 10 randommask noises [12] marginalized DAE (MDAE) [17] Constrac-tive Autoencoders (CAE) [15] Deep Lambertian Networks(DLN) [31] stacked supervised autoencoder (SSAE) [9] ICA-Reconstruction (RICA) [32] and Template Deep Recon-struction Model (TDRM) [20] We use the implementationof these algorithms that are provided by the respectiveauthors For all the compared approaches we use the defaultparameters that are recommended in the correspondingpapers

The mean identification accuracy with standard devia-tions of different approaches on three databases is shownin Table 1 The ROC curves of different approaches areillustrated in Figure 7 The results imply that our approach

Table 1 Comparisons of the average identification accuracyand standard deviation () of different approaches on differentdatabases

Method AR Extended Yale B PubFigDAE [12] 5756 plusmn 02 6345 plusmn 13 6133 plusmn 15MDAE [17] 6780 plusmn 13 7156 plusmn 16 7055 plusmn 25CAE [15] 4950 plusmn 21 5572 plusmn 08 6856 plusmn 16DLN [31] NA 8150 plusmn 14 7760 plusmn 14SSAE [9] 8521 plusmn 07 8222 plusmn 03 8404 plusmn 12RICA [32] 7633 plusmn 17 7044 plusmn 13 7235 plusmn 15TDRM [20] 8770 plusmn 06 8642 plusmn 12 8990 plusmn 09Our method 9232 plusmn 07 9366 plusmn 04 9126 plusmn 16significantly outperforms other methods and gets the bestmean recognition rates for the same setting of trainingand testing sets Compared to those unsupervised deeplearning methods such as DAE MDAE CAE DLN andTDRM the improvement of our method is over 30 onExtended Yale B and AR databases where there is a littlepose variance On the PubFig database our approach canalso achieve the mean identification rate of 9126 plusmn 16

10 Mathematical Problems in Engineering

True

pos

itive

rate

04

05

06

07

08

09

10

02 03 04 05 06 07 08 09 1001

False positive rate

DAEMDAECAEDLN

SSAERICATDRMOur method

(a)

01 02 03 04 05 06 07 08 09 1000

False positive rate

07

08

09

10

True

pos

itive

rate

DAEMDAECAESSAE

RICATDRMOur method

(b)

050

055

060

065

070

075

080

085

090

095

100

True

pos

itive

rate

02 03 04 05 06 07 08 09 1001

False positive rate

DAEMDAECAEDLN

SSAERICATDRMOur method

(c)

Figure 7 Comparisons of ROC curves between our method and other methods on different databases (a) AR (b) Extended Yale B and (c)PubFig

and outperforms all compared methods The reason is thatourmethod can extract discriminative robust information tovariances (expression illumination pose etc) in the learneddeep networks Compared with a supervised method likeRICA the proposed method can improve over 16 19and 23 on AR PubFig and Extended Yale B databasesrespectively Our method is a deep learning method whichfocuses on the nonlinear classification problem with learninga nonlinear mapping such that more nonlinear discriminantinformation may be explored to enhance the identificationperformance Compared with SSAE method that is designedfor removing the variances such as illumination pose andpartial occlusion our method can still be better over 6

because of using the weight penalty terms GRBM to initializeweights and three layersrsquo similarity preservation term

432 Convergence Analysis In this subsection we evaluatedthe convergence of our ADSNT versus a different numberof iterations Figure 8 illustrates the value of the objectivefunction of ADSNT versus a different number of iterationson the AR PubFig and Extended Yale B databases FromFigure 8(a) one can observe that ADSNT converges in about55 28 and 70 iterations on the three databases respectively

We also implement the identification accuracy of ADSNTversus a different number of iterations on the AR PubFigand Extended Yale B databases Figure 8(b) plots the mean

Mathematical Problems in Engineering 11

Obj

ectiv

e fun

ctio

n va

lue

4

3

2

1

0

Iteration number0 30 60 90 120 150

ARExtended Yale BPubFig

(a)

ARExtended Yale BPubFig

Mea

n id

entifi

catio

n ra

te (

)

100

80

60

40

Iteration number20 40 60 80

(b)

Figure 8 Convergence analysis (a) Convergence curves of ADSNT on AR PubFig and Extended Yale B (b) Mean identification rate ()versus iterations of ADSNT on AR PubFig and Extended Yale B

Iden

tifica

tion

accu

racy

()

AR Extended Yale B PubFig0

10

20

30

40

50

60

70

80

90

100

Layer 1

Layer 2

Layer 3

Layer 4

Figure 9 The results of ADSNT with different network depth onthe different datasets

identification rate of ADSNT From Figure 8(b) one can alsoobserve that ADSNT achieves stable performance after about55 70 and 28 iterations on AR PubFig and Extended Yale Bdatabases respectively

433 The Effect of Network Depth In this subsection weconduct experiments on the three face datasets with differenthidden layer of our proposedADSNTnetworkThe proposedmethod achieves an identification rate of 923 plusmn 06 933 plusmn12 and 9122 plusmn 08 by three-hidden layer ADSNTnetwork that is 1024 rarr 500 rarr 120 respectively on ARExtended Yale B and PubFig datasets Figure 9 illustrates the

performance of different layer ADSNT One can observe thatthree-hidden layer network outperforms 2-layer networkand the result of 3-layer ADSNT network is very nearly equalto those of 4-layer network on AR and Extended Yale Bdatabases We also observe that the performance of 4-layernetwork is a bit lower than that of 3-layer network on thePubFig database In addition the deeper ADSNT networkis the more complex its computational complexity becomesTherefore the 3-layer network depth is a good trade-offbetween performance and computational complexity

434 Activation Function Following the work in [9] we alsoestimate the performance of ADSNTwith different activationfunctions such as sigmoid hyperbolic tangent and rectifiedlinear unit (ReLU) [33] which is defined as 119891(119909) = max(0 119909)When the sigmoid 119891(119909) = 1(1 + 119890minus119909) is used as activationfunction the objective function (see (6)) is rewritten asfollows

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

KL (120588119909 || 1205880) + 5sum119894

KL (120588 || 1205880))

(9)

where 120588119909 = (1119872)sum119894 119891(119909119894) 120588 = (1119872)sum119894 119891(119909119894)

12 Mathematical Problems in Engineering

Table 2 Comparisons of the ADSNT algorithm with differentactivation functions on the AR PubFig and Extended Yale Bdatabases

Dataset Sigmoid Tanh ReLUAR 8866 plusmn 14 9232 plusmn 07 9322 plusmn 15Extended Yale B 9055 plusmn 06 9366 plusmn 04 9454 plusmn 03PubFig 8740 plusmn 12 9126 plusmn 16 9244 plusmn 11

If ReLU is adopted as activation function (6) is formu-lated as

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171 + 5sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171)

(10)

Table 2 shows the performance of the proposed ADSNTbased on different activation functions conducted on thethree databases FromTable 2 one can see that ReLU achievesthe best performanceThe key reason is that we use theweightdecay term 120593 to optimize the objective function

435 Timing Consumption Analysis In this subsection weuse a HP Z620 workstation with Intel Xeon E5-2609 24GHzCPU 8G RAM and conduct a series of experiments onAR database to compare the time consumption of differentmethods which are tabulated in Table 3 The training time(seconds) is shown in Table 3(a) while the time (seconds)needed to recognize a face from the testing set is shownin Table 3(b) From Table 3 one can see that the proposedmethod requires comparatively more time for trainingbecause of initialization of ADSNT and performing imagereconstruction However the procedure of training is offlineWhen we identity an image from testing set our methodrequires less time than other methods

5 Conclusions

In this article we present an adaptive deep supervisedautoencoder based image reconstruction method for facerecognition Unlike conventional deep autoencoder basedface recognition method our method considers the classlabel information from training samples in the deep learn-ing procedure and can automatically discover the under-lying nonlinear manifold structures Specifically a multi-layer supervised adaptive network structure is presentedwhich is trained to extract characteristic features from cor-ruptedclean facial images and reconstruct the correspondingsimilar facial images The reconstruction is realized by a so-called ldquobottleneckrdquo neural network that learns to map face

Table 3 (a) Training time (seconds) for different methods (b)Testing time (seconds) for different methodsThe proposed methodcosts the least amount of testing time comparing with othermethods

(a)

Methods TimeDAE [12] 861MDAE [17] 1054CAE [15] 786DLN [31] 2343SSAE [9] 5351RICA [32] 1344TDRM [20] 1102Our method 12232

(b)

Methods TimeDAE [12] 027MDAE [17] 03CAE [15] 026DLN [31] 035SSAE [9] 022RICA [32] 019TDRM [20] 018Our method 013

images into a low-dimensional vector and to reconstructthe respective corresponding face images from the mappingvectors Having trained the ADSNT a new face image canthen be recognized by comparing its reconstruction imagewith individual gallery images during testing The proposedmethod has been evaluated on the widely used AR PubFigand Extended Yale B databases and the experimental resultshave shown its effectiveness For future work we are focusingon applying our proposed method to other application fieldssuch as pattern classification based on image set and actionrecognition based on the video to further demonstrate itsvalidity

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is partially supported by the research grant for theNatural Science Foundation from Sichuan Provincial Depart-ment of Education (Grant no 13ZB0336) and the NationalNatural Science Foundation of China (Grant no 61502059)

References

[1] J-T Chien and C-CWu ldquoDiscriminant waveletfaces and near-est feature classifiers for face recognitionrdquo IEEE Transactions on

Mathematical Problems in Engineering 13

Pattern Analysis and Machine Intelligence vol 24 no 12 pp1644ndash1649 2002

[2] Z Lei M Pietikainen and S Z Li ldquoLearning discriminant facedescriptorrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 36 no 2 pp 289ndash302 2014

[3] B Moghaddam T Jebara and A Pentland ldquoBayesian facerecognitionrdquo Pattern Recognition vol 33 no 11 pp 1771ndash17822000

[4] N Cristianini and J S Taylor An Introduction to SupportVector Machines and Other Kernel-based Learning MethodsCambridge University Press New York NY USA 2004

[5] R O Duda P E Hart and D G Stork Pattern ClassificationJohn Wiley amp Sons New York NY USA 2nd edition 2001

[6] W Zhao R Chellappa P J Phillips and A Rosenfeld ldquoFacerecognition a literature surveyrdquo ACM Computing Surveys vol35 no 4 pp 399ndash458 2003

[7] B Zhang S Shan X Chen and W Gao ldquoHistogram of Gaborphase patterns (HGPP) a novel object representation approachfor face recognitionrdquo IEEE Transactions on Image Processingvol 16 no 1 pp 57ndash68 2007

[8] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inNeural Information Processing Systems pp 1527ndash1554 2012

[9] S Gao Y Zhang K Jia J Lu and Y Zhang ldquoSingle sample facerecognition via learning deep supervised autoencodersrdquo IEEETransactions on Information Forensics and Security vol 10 no10 pp 2108ndash2118 2015

[10] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo American Associationfor the Advancement of Science Science vol 313 no 5786 pp504ndash507 2006

[11] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesmrdquo in Neural Networks Tricks ofthe Trade pp 437ndash478 Springer Berlin Germany 2012

[12] P Vincent H Larochelle I Lajoie Y Bengio and P-AManzagol ldquoStacked denoising autoencoders learning usefulrepresentations in a deep network with a local denoisingcriterionrdquo Journal of Machine Learning Research (JMLR) vol 11no 5 pp 3371ndash3408 2010

[13] V Nair and G E Hinton ldquoRectified linear units improveRestricted Boltzmann machinesrdquo in Proceedings of the 27thInternational Conference on Machine Learning (ICML rsquo10) pp807ndash814 Haifa Israel June 2010

[14] A Coates H Lee and A Y Ng ldquoAn analysis of single-layernetworks in unsupervised feature learningrdquo in Proceedings ofthe 14th International Conference on Artificial Intelligence andStatistics (AISTATS rsquo11) pp 215ndash223 Sardinia Italy 2010

[15] S Rifai P Vincent X Muller X Glorot and Y BengioldquoContractive auto-encoders explicit invariance during featureextractionrdquo in Proceedings of the 28th International Conferenceon Machine Learning (ICML rsquo11) pp 833ndash840 Bellevue WashUSA July 2011

[16] K Simonyan A Vedaldi and A Zisserman ldquoDeep fishernetworks for large-scale image classificationrdquo in Proceedings ofthe 27th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo13) pp 163ndash171 Lake Tahoe Nev USA Decem-ber 2013

[17] M Chen Z Xu K Q Weinberger and F Sha ldquoMarginalizeddenoising autoencoders for domain adaptationrdquo in Proceedingsof the 29th International Conference onMachine Learning (ICMLrsquo12) pp 767ndash774 Edinburgh UK July 2012

[18] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquo inProceedings of the 27th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 June 2014

[19] Z Zhu P Luo X Wang and X Tang ldquoDeep learning identity-preserving face spacerdquo in Proceedings of the 14th IEEE Interna-tional Conference on Computer Vision (ICCV rsquo13) pp 113ndash120Sydney Australia December 2013

[20] M Hayat M Bennamoun and S An ldquoDeep reconstructionmodels for image set classificationrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 37 no 4 pp 713ndash727 2015

[21] Y Sun X Wang and X Tang ldquoDeep learning face representa-tion from predicting 10000 classesrdquo in Proceedings of the 27thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo14) pp 1891ndash1898 Columbus Ohio USA June 2014

[22] Y Sun X Wang and X Tang ldquoDeep learning face rep-resentation by joint identification-verificationrdquo Tech Rephttpsarxivorgabs14064773

[23] X Cai CWang B Xiao X Chen and J Zhou ldquoDeep nonlinearmetric learning with independent subspace analysis for faceverificationrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 749ndash752 November2012

[24] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[25] Q V Le J Ngiam A Coates A Lahiri B Prochnow andA Y Ng ldquoOn optimization methods for deep learningrdquo inProceedings of the 28th International Conference on MachineLearning (ICML rsquo11) pp 265ndash272 Bellevue Wash USA July2011

[26] C Zhou X Wei Q Zhang and X Fang ldquoFisherrsquos lineardiscriminant (FLD) and support vectormachine (SVM) in non-negative matrix factorization (NMF) residual space for facerecognitionrdquoOptica Applicata vol 40 no 3 pp 693ndash704 2010

[27] A Martinez and R Benavente ldquoThe AR face databaserdquo CVCTech Rep 24 1998

[28] A S Georghiades P N Belhumeur and D J Kriegman ldquoFromfew to many illumination cone models for face recognitionunder variable lighting and poserdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 23 no 6 pp 643ndash6602001

[29] N Kumar A C Berg P N Belhumeur and S K NayarldquoAttribute and simile classifiers for face verificationrdquo in Proceed-ings of the 12th International Conference on Computer Vision(ICCV rsquo09) pp 365ndash372 Kyoto Japan October 2009

[30] M Rezaei and R Klette ldquoNovel adaptive eye detection andtracking for challenging lighting conditionsrdquo in ComputerVisionmdashACCV 2012 Workshops J-I Park and J Kim Edsvol 7729 of Lecture Notes in Computer Science pp 427ndash440Springer Berlin Germany 2013

[31] Y Tang R Salakhutdinov andGHHinton ldquoDeep Lambertiannetworksrdquo inProceedings of the 29th International Conference onMachine Learning (ICML 2012) pp 1623ndash1630 Edinburgh UKJuly 2012

[32] Q V Le A Karpenko J Ngiam and A Y Ng ldquoICA withreconstruction cost for efficient overcomplete feature learningrdquoin Proceedings of the 25th Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo11) pp 1017ndash1025 GranadaSpain December 2011

14 Mathematical Problems in Engineering

[33] X Glorot A Bordes and Y Bengio ldquoDeep sparse rectifier neu-ral networksrdquo in Proceedings of the 14th International Conferenceon Artificial Intelligence and Statistics pp 315ndash323 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article Adaptive Deep Supervised Autoencoder ...Adaptive Deep Supervised Network Template (ADSNT). the deep network perform well, similar to [], we need to giveitinitializationweights.en,thepreinitializedADSNT

6 Mathematical Problems in Engineering

(1) Preprocess all images namely perform histogramequalization

(2) 119883 Randomly select a small subset for each individualfromΩ

(3) Initialize Train GRBMs by using 119883 to initialize the120579ADSNT = 120579119882 120579119887(4) (Optimization by L-BFGS)

For 119903 = 1 2 119877 doCalculate 119869reg using (6)

If 119903 gt 1 and |119869119903 minus119869119903minus1| lt 120576 go to ReturnReturn 120579119882 and 120579119887

Since training the ADSNT model aims to reconstructclean data namely gallery images from corrupt data it mightlearn an underlying structure from the corrupt data andproduce very useful representation Furthermore we canlearn an overcomplete sparse representation from corruptdata through mapping them into a high-dimensional featurespace since the first hidden layer has the number of neuronslarger than the dimensionality of original data The high-dimensional model representation is then followed by a so-called ldquobottleneckrdquo that is the data is further mapped to anabstract compact and low-dimensional model representa-tion in the subsequent layers of the encoder Through sucha mapping the redundant information such as illuminationposes and partial occlusion in the corrupted faces is removedandonly the useful information content for us is kept In addi-tion we know that if we use AE with only one hidden layerand jointly linear activation functions the learned weightswould be analogous to a PCA subspace [20] However AEis an unsupervised algorithm In our work we make use ofthe class label information to train SAE so if we also useonly one hidden layer with a linear activation function thelearnedweights by the SAE are thought to be similar to ldquoLDArdquosubspace However in our structure we apply the nonlinearactivation functions and stack several hidden layers togetherand then theADSNTcan adapt to very complicated nonlinearmanifold structures Some of reconstructed images based onADSNT fromAR database are shown in Figure 4(b) One cansee that ADSNT can remove the illumination For those faceimages with partial occlusion ADSNT can also imitate theclean facesThis results are not surprising because the humanbeing has the capability of inferring the unknown faces fromknown face images via the experience (for deep networkstructure the experience learned derives from generic set)[9]34 Face Classification Based on ADSNT Image Reconstruc-tion To better train ADSNT all images need to be pre-processed It is a very important step for object recogni-tion including face recognition The common ways includehistogram equalization geometry normalization and imagesmoothing In this paper for the sake of simplicity we onlyperform histogram equalization on all the facial images tominimize illumination variations That is we utilize his-togram equalization to normalize the histogram of facial

images and make them more compact For the details abouthistogram equalization one can be referred to see [26]

After the ADSNT is trained completely with a certainnumber of individuals we can use it to performon the unseenface images for recognizing them

Given a test facial image 119909(119905) which is also preprocessedwith histogram equalization in the same way as the trainingimages and presented to the ADSNTnetwork we reconstruct(using (3) and (4)) image 119909(119905) from ADSNT which is similarto clean face For the sake of simplicity the nearest neighborclassification based on the Euclidean distance between thereconstruction and all the gallery images identifies the classThe classification formula is defined as

119868119896 (119909(119905)) = argmin119892

1003817100381710038171003817100381710038171003817119909(119905) minus 1199091198921003817100381710038171003817100381710038171003817 forall119892 isin 1 2 119888 (8)

where 119868119896(119909(119905)) is the resulting identity and119909119892 is the clean facialimage in the gallery images of individual 1198924 Experimental Results and Discussion

In this section extensive experiments are conducted topresent and compare the performance of different methodswith the proposed approach The experiments are imple-mented on three widely used face databases that is AR[27] Extended Yale B [28] and PubFig [29] The details ofthese three databases and performance evaluation of differentapproaches are presented as follows

41 Dataset Description TheARdatabase contains over 4000color face images from 126 people (56 women and 70 men)The images were taken in two sessions (between two weeks)and each session contained 13 pictures from one personThese images contain frontal view faces with different facialexpression illuminations and occlusions (sun glasses andscarf) Some sample face images from AR are illustratedin Figure 5(a) In our experiments for each person wechoose the facial imageswith neutral expression frontal poseand normal illumination condition as gallery images andrandomly select half the number of images from the rest ofthe images of each person as probe images The remainingimages compose the testing set

The Extended Yale B database consists of 16128 imagesof 38 people under 64 illumination conditions and 9 posesSome sample face images fromExtendedYale B are illustratedin Figure 5(b) For each person we select the faces that havenormal light condition and frontal pose as gallery images andrandomly choose 6 poses and 16 illumination face images tocompose the probe images The remaining images composethe testing set

The PubFig database is composed of 58797 images of200 subjects taken from the internet The images of thedatabase were taken in completely uncontrolled conditionswith noncooperative people These images have a very largedegree of variability in face expression pose illuminationand so forth Some sample images fromPubFig are illustratedin Figure 5(c) In our experiments for each individualwe select the faces with neutral expression the frontal or

Mathematical Problems in Engineering 7

(a)

(b)

Figure 4 Some original images fromARdatabase and the reconstructed ones (a) Original face images (corrupted faces) (b) Reconstructedfaces near frontal pose and normal illumination as galleries andrandomly choose half the number of images from the rest ofthe images of each person as probes The remaining imagescompose the testing set42 Experimental Settings In all the experiments the facialimages from the AR PubFig and Extended Yale B databasesare automatically detected using OpenCV face detector [30]After that we normalize the detected facial images (inorientation and scale) such that two eyes can be alignedat the same location Then the face areas are cropped andconverted to 256 gray levels images The size of each croppedimage is 26 times 30 pixels Thus the dimensionality of the inputvector is 780 Figure 6 presents an example fromAR databaseand the corresponding cropped image Each cropped facialimage is further preprocessed with histogram equalizationto minimize illumination variations We train our ADSNTmodel with 3 hidden layers where the number of hidden

nodes for these layers is empirically set as [1024 rarr 500 rarr120] because our experiments show that three hidden layerscan get a sufficiently good performance (see Section 433)

In order to show the whole experimental process aboutparameters setting we initially use the hyperbolic tangentfunction as the nonlinear activation function and implementADSNT on AR We also choose the face images with neutralexpression frontal pose and normal illumination as galleriesand randomly select half the number of images from the restof the images of each person as probe images The remainingimages compose the testing setThemean identification ratesare recorded

Firstly we empirically set the parameter 120576 = 0001 andsparsity target 1205880 = 005 and fix the parameters 120582 = 05and 120593 = 01 in ADSNT to check the effect of 120574 on theidentification rate As illustrated in Figure 6(a) where 120574 =008 ADSNT recognition method gets the best performance

8 Mathematical Problems in Engineering

(a)

(b)

(c)

Figure 5 A fraction of samples from AR PubFig and Extended Yale B face databases (a) AR (b) Extended Yale B and (c) PubFig

Then according to Figure 6(a) we fix the parameters 120574 = 008and 120582 = 05 in ADSNT to check the influence of 120593 Asshowed in Figure 6(b) when 120593 = 06 our method achievesthe best recognition rate At last we fix 120574 = 008 and 120593 = 06and the recognition rates are illustrated in Figure 6(c) withdifferent value of 120582 When 120582 = 3 the recognition rate is thehighest From the plot in Figure 6 one can observe that theparameters 120582 120593 and 120574 cannot be too large or too small If120582 is too large the ADSNT would be less discriminative ofdifferent subjects because it implements too strong similaritypreservation entry But if 120582 is too small it will degrade therecognition performance and the significance of similaritypreservation entry Similarly 120574 can also not be too large orthe hidden neurons will not be activated for a given input

and low recognition rate will be achieved If 120574 is too smallwe can get poor performance For the weight decay 120593 if itis too small the values of weights for all hidden units willchange very slightly On the contrary the values of weightswill change greatly

Using above those experiments we gain the optimalparameter values used in ADSNT as 120582 = 3 120593 = 06 and120574 = 008 on AR database The similar experiments also havebeen performed on Extended Yale B and PubFig databasesWe can get the parameters setting as 120582 = 26 120593 = 05 and120574 = 006 on Extended Yale B database and 120582 = 28 120593 = 052and 120574 = 009 on PubFig database

In the experiments we use two measures includingthe mean identification accuracy 120583 with standard deviation

Mathematical Problems in Engineering 9

Iden

tifica

tion

rate

()

120582 = 05120593 = 01

76

80

84

88

92

01

011

005

006

007

008

009

012

013

014

000

5

000

1

000

01

The parameter 120574

(a)

Iden

tifica

tion

rate

()

95

90

85

80

75

120582 = 05

1 2

08

07

010

02

03

04

05

06

09

22

004

002

008

000

9

000

6

The parameter 120593

120574 = 008

(b)

Iden

tifica

tion

rate

()

The parameter 120582

120593 = 06120574 = 008

80

85

90

55

50

45

40

35

3025

20

15

10

05

00

60

(c)

Figure 6 Parameters setting

](120583 plusmn ]) and the receiving operating characteristic (ROC)curves to validate the effectiveness of our method as well asother methods

43 Experimental Results and Analysis

431 Comparison with Different Methods In the followingexperiments on the three databases we compare the pro-posed approach with several recently proposed methodsThese compared methods include DAE with 10 randommask noises [12] marginalized DAE (MDAE) [17] Constrac-tive Autoencoders (CAE) [15] Deep Lambertian Networks(DLN) [31] stacked supervised autoencoder (SSAE) [9] ICA-Reconstruction (RICA) [32] and Template Deep Recon-struction Model (TDRM) [20] We use the implementationof these algorithms that are provided by the respectiveauthors For all the compared approaches we use the defaultparameters that are recommended in the correspondingpapers

The mean identification accuracy with standard devia-tions of different approaches on three databases is shownin Table 1 The ROC curves of different approaches areillustrated in Figure 7 The results imply that our approach

Table 1 Comparisons of the average identification accuracyand standard deviation () of different approaches on differentdatabases

Method AR Extended Yale B PubFigDAE [12] 5756 plusmn 02 6345 plusmn 13 6133 plusmn 15MDAE [17] 6780 plusmn 13 7156 plusmn 16 7055 plusmn 25CAE [15] 4950 plusmn 21 5572 plusmn 08 6856 plusmn 16DLN [31] NA 8150 plusmn 14 7760 plusmn 14SSAE [9] 8521 plusmn 07 8222 plusmn 03 8404 plusmn 12RICA [32] 7633 plusmn 17 7044 plusmn 13 7235 plusmn 15TDRM [20] 8770 plusmn 06 8642 plusmn 12 8990 plusmn 09Our method 9232 plusmn 07 9366 plusmn 04 9126 plusmn 16significantly outperforms other methods and gets the bestmean recognition rates for the same setting of trainingand testing sets Compared to those unsupervised deeplearning methods such as DAE MDAE CAE DLN andTDRM the improvement of our method is over 30 onExtended Yale B and AR databases where there is a littlepose variance On the PubFig database our approach canalso achieve the mean identification rate of 9126 plusmn 16

10 Mathematical Problems in Engineering

True

pos

itive

rate

04

05

06

07

08

09

10

02 03 04 05 06 07 08 09 1001

False positive rate

DAEMDAECAEDLN

SSAERICATDRMOur method

(a)

01 02 03 04 05 06 07 08 09 1000

False positive rate

07

08

09

10

True

pos

itive

rate

DAEMDAECAESSAE

RICATDRMOur method

(b)

050

055

060

065

070

075

080

085

090

095

100

True

pos

itive

rate

02 03 04 05 06 07 08 09 1001

False positive rate

DAEMDAECAEDLN

SSAERICATDRMOur method

(c)

Figure 7 Comparisons of ROC curves between our method and other methods on different databases (a) AR (b) Extended Yale B and (c)PubFig

and outperforms all compared methods The reason is thatourmethod can extract discriminative robust information tovariances (expression illumination pose etc) in the learneddeep networks Compared with a supervised method likeRICA the proposed method can improve over 16 19and 23 on AR PubFig and Extended Yale B databasesrespectively Our method is a deep learning method whichfocuses on the nonlinear classification problem with learninga nonlinear mapping such that more nonlinear discriminantinformation may be explored to enhance the identificationperformance Compared with SSAE method that is designedfor removing the variances such as illumination pose andpartial occlusion our method can still be better over 6

because of using the weight penalty terms GRBM to initializeweights and three layersrsquo similarity preservation term

432 Convergence Analysis In this subsection we evaluatedthe convergence of our ADSNT versus a different numberof iterations Figure 8 illustrates the value of the objectivefunction of ADSNT versus a different number of iterationson the AR PubFig and Extended Yale B databases FromFigure 8(a) one can observe that ADSNT converges in about55 28 and 70 iterations on the three databases respectively

We also implement the identification accuracy of ADSNTversus a different number of iterations on the AR PubFigand Extended Yale B databases Figure 8(b) plots the mean

Mathematical Problems in Engineering 11

Obj

ectiv

e fun

ctio

n va

lue

4

3

2

1

0

Iteration number0 30 60 90 120 150

ARExtended Yale BPubFig

(a)

ARExtended Yale BPubFig

Mea

n id

entifi

catio

n ra

te (

)

100

80

60

40

Iteration number20 40 60 80

(b)

Figure 8 Convergence analysis (a) Convergence curves of ADSNT on AR PubFig and Extended Yale B (b) Mean identification rate ()versus iterations of ADSNT on AR PubFig and Extended Yale B

Iden

tifica

tion

accu

racy

()

AR Extended Yale B PubFig0

10

20

30

40

50

60

70

80

90

100

Layer 1

Layer 2

Layer 3

Layer 4

Figure 9 The results of ADSNT with different network depth onthe different datasets

identification rate of ADSNT From Figure 8(b) one can alsoobserve that ADSNT achieves stable performance after about55 70 and 28 iterations on AR PubFig and Extended Yale Bdatabases respectively

433 The Effect of Network Depth In this subsection weconduct experiments on the three face datasets with differenthidden layer of our proposedADSNTnetworkThe proposedmethod achieves an identification rate of 923 plusmn 06 933 plusmn12 and 9122 plusmn 08 by three-hidden layer ADSNTnetwork that is 1024 rarr 500 rarr 120 respectively on ARExtended Yale B and PubFig datasets Figure 9 illustrates the

performance of different layer ADSNT One can observe thatthree-hidden layer network outperforms 2-layer networkand the result of 3-layer ADSNT network is very nearly equalto those of 4-layer network on AR and Extended Yale Bdatabases We also observe that the performance of 4-layernetwork is a bit lower than that of 3-layer network on thePubFig database In addition the deeper ADSNT networkis the more complex its computational complexity becomesTherefore the 3-layer network depth is a good trade-offbetween performance and computational complexity

434 Activation Function Following the work in [9] we alsoestimate the performance of ADSNTwith different activationfunctions such as sigmoid hyperbolic tangent and rectifiedlinear unit (ReLU) [33] which is defined as 119891(119909) = max(0 119909)When the sigmoid 119891(119909) = 1(1 + 119890minus119909) is used as activationfunction the objective function (see (6)) is rewritten asfollows

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

KL (120588119909 || 1205880) + 5sum119894

KL (120588 || 1205880))

(9)

where 120588119909 = (1119872)sum119894 119891(119909119894) 120588 = (1119872)sum119894 119891(119909119894)

12 Mathematical Problems in Engineering

Table 2 Comparisons of the ADSNT algorithm with differentactivation functions on the AR PubFig and Extended Yale Bdatabases

Dataset Sigmoid Tanh ReLUAR 8866 plusmn 14 9232 plusmn 07 9322 plusmn 15Extended Yale B 9055 plusmn 06 9366 plusmn 04 9454 plusmn 03PubFig 8740 plusmn 12 9126 plusmn 16 9244 plusmn 11

If ReLU is adopted as activation function (6) is formu-lated as

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171 + 5sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171)

(10)

Table 2 shows the performance of the proposed ADSNTbased on different activation functions conducted on thethree databases FromTable 2 one can see that ReLU achievesthe best performanceThe key reason is that we use theweightdecay term 120593 to optimize the objective function

435 Timing Consumption Analysis In this subsection weuse a HP Z620 workstation with Intel Xeon E5-2609 24GHzCPU 8G RAM and conduct a series of experiments onAR database to compare the time consumption of differentmethods which are tabulated in Table 3 The training time(seconds) is shown in Table 3(a) while the time (seconds)needed to recognize a face from the testing set is shownin Table 3(b) From Table 3 one can see that the proposedmethod requires comparatively more time for trainingbecause of initialization of ADSNT and performing imagereconstruction However the procedure of training is offlineWhen we identity an image from testing set our methodrequires less time than other methods

5 Conclusions

In this article we present an adaptive deep supervisedautoencoder based image reconstruction method for facerecognition Unlike conventional deep autoencoder basedface recognition method our method considers the classlabel information from training samples in the deep learn-ing procedure and can automatically discover the under-lying nonlinear manifold structures Specifically a multi-layer supervised adaptive network structure is presentedwhich is trained to extract characteristic features from cor-ruptedclean facial images and reconstruct the correspondingsimilar facial images The reconstruction is realized by a so-called ldquobottleneckrdquo neural network that learns to map face

Table 3 (a) Training time (seconds) for different methods (b)Testing time (seconds) for different methodsThe proposed methodcosts the least amount of testing time comparing with othermethods

(a)

Methods TimeDAE [12] 861MDAE [17] 1054CAE [15] 786DLN [31] 2343SSAE [9] 5351RICA [32] 1344TDRM [20] 1102Our method 12232

(b)

Methods TimeDAE [12] 027MDAE [17] 03CAE [15] 026DLN [31] 035SSAE [9] 022RICA [32] 019TDRM [20] 018Our method 013

images into a low-dimensional vector and to reconstructthe respective corresponding face images from the mappingvectors Having trained the ADSNT a new face image canthen be recognized by comparing its reconstruction imagewith individual gallery images during testing The proposedmethod has been evaluated on the widely used AR PubFigand Extended Yale B databases and the experimental resultshave shown its effectiveness For future work we are focusingon applying our proposed method to other application fieldssuch as pattern classification based on image set and actionrecognition based on the video to further demonstrate itsvalidity

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is partially supported by the research grant for theNatural Science Foundation from Sichuan Provincial Depart-ment of Education (Grant no 13ZB0336) and the NationalNatural Science Foundation of China (Grant no 61502059)

References

[1] J-T Chien and C-CWu ldquoDiscriminant waveletfaces and near-est feature classifiers for face recognitionrdquo IEEE Transactions on

Mathematical Problems in Engineering 13

Pattern Analysis and Machine Intelligence vol 24 no 12 pp1644ndash1649 2002

[2] Z Lei M Pietikainen and S Z Li ldquoLearning discriminant facedescriptorrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 36 no 2 pp 289ndash302 2014

[3] B Moghaddam T Jebara and A Pentland ldquoBayesian facerecognitionrdquo Pattern Recognition vol 33 no 11 pp 1771ndash17822000

[4] N Cristianini and J S Taylor An Introduction to SupportVector Machines and Other Kernel-based Learning MethodsCambridge University Press New York NY USA 2004

[5] R O Duda P E Hart and D G Stork Pattern ClassificationJohn Wiley amp Sons New York NY USA 2nd edition 2001

[6] W Zhao R Chellappa P J Phillips and A Rosenfeld ldquoFacerecognition a literature surveyrdquo ACM Computing Surveys vol35 no 4 pp 399ndash458 2003

[7] B Zhang S Shan X Chen and W Gao ldquoHistogram of Gaborphase patterns (HGPP) a novel object representation approachfor face recognitionrdquo IEEE Transactions on Image Processingvol 16 no 1 pp 57ndash68 2007

[8] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inNeural Information Processing Systems pp 1527ndash1554 2012

[9] S Gao Y Zhang K Jia J Lu and Y Zhang ldquoSingle sample facerecognition via learning deep supervised autoencodersrdquo IEEETransactions on Information Forensics and Security vol 10 no10 pp 2108ndash2118 2015

[10] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo American Associationfor the Advancement of Science Science vol 313 no 5786 pp504ndash507 2006

[11] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesmrdquo in Neural Networks Tricks ofthe Trade pp 437ndash478 Springer Berlin Germany 2012

[12] P Vincent H Larochelle I Lajoie Y Bengio and P-AManzagol ldquoStacked denoising autoencoders learning usefulrepresentations in a deep network with a local denoisingcriterionrdquo Journal of Machine Learning Research (JMLR) vol 11no 5 pp 3371ndash3408 2010

[13] V Nair and G E Hinton ldquoRectified linear units improveRestricted Boltzmann machinesrdquo in Proceedings of the 27thInternational Conference on Machine Learning (ICML rsquo10) pp807ndash814 Haifa Israel June 2010

[14] A Coates H Lee and A Y Ng ldquoAn analysis of single-layernetworks in unsupervised feature learningrdquo in Proceedings ofthe 14th International Conference on Artificial Intelligence andStatistics (AISTATS rsquo11) pp 215ndash223 Sardinia Italy 2010

[15] S Rifai P Vincent X Muller X Glorot and Y BengioldquoContractive auto-encoders explicit invariance during featureextractionrdquo in Proceedings of the 28th International Conferenceon Machine Learning (ICML rsquo11) pp 833ndash840 Bellevue WashUSA July 2011

[16] K Simonyan A Vedaldi and A Zisserman ldquoDeep fishernetworks for large-scale image classificationrdquo in Proceedings ofthe 27th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo13) pp 163ndash171 Lake Tahoe Nev USA Decem-ber 2013

[17] M Chen Z Xu K Q Weinberger and F Sha ldquoMarginalizeddenoising autoencoders for domain adaptationrdquo in Proceedingsof the 29th International Conference onMachine Learning (ICMLrsquo12) pp 767ndash774 Edinburgh UK July 2012

[18] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquo inProceedings of the 27th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 June 2014

[19] Z Zhu P Luo X Wang and X Tang ldquoDeep learning identity-preserving face spacerdquo in Proceedings of the 14th IEEE Interna-tional Conference on Computer Vision (ICCV rsquo13) pp 113ndash120Sydney Australia December 2013

[20] M Hayat M Bennamoun and S An ldquoDeep reconstructionmodels for image set classificationrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 37 no 4 pp 713ndash727 2015

[21] Y Sun X Wang and X Tang ldquoDeep learning face representa-tion from predicting 10000 classesrdquo in Proceedings of the 27thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo14) pp 1891ndash1898 Columbus Ohio USA June 2014

[22] Y Sun X Wang and X Tang ldquoDeep learning face rep-resentation by joint identification-verificationrdquo Tech Rephttpsarxivorgabs14064773

[23] X Cai CWang B Xiao X Chen and J Zhou ldquoDeep nonlinearmetric learning with independent subspace analysis for faceverificationrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 749ndash752 November2012

[24] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[25] Q V Le J Ngiam A Coates A Lahiri B Prochnow andA Y Ng ldquoOn optimization methods for deep learningrdquo inProceedings of the 28th International Conference on MachineLearning (ICML rsquo11) pp 265ndash272 Bellevue Wash USA July2011

[26] C Zhou X Wei Q Zhang and X Fang ldquoFisherrsquos lineardiscriminant (FLD) and support vectormachine (SVM) in non-negative matrix factorization (NMF) residual space for facerecognitionrdquoOptica Applicata vol 40 no 3 pp 693ndash704 2010

[27] A Martinez and R Benavente ldquoThe AR face databaserdquo CVCTech Rep 24 1998

[28] A S Georghiades P N Belhumeur and D J Kriegman ldquoFromfew to many illumination cone models for face recognitionunder variable lighting and poserdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 23 no 6 pp 643ndash6602001

[29] N Kumar A C Berg P N Belhumeur and S K NayarldquoAttribute and simile classifiers for face verificationrdquo in Proceed-ings of the 12th International Conference on Computer Vision(ICCV rsquo09) pp 365ndash372 Kyoto Japan October 2009

[30] M Rezaei and R Klette ldquoNovel adaptive eye detection andtracking for challenging lighting conditionsrdquo in ComputerVisionmdashACCV 2012 Workshops J-I Park and J Kim Edsvol 7729 of Lecture Notes in Computer Science pp 427ndash440Springer Berlin Germany 2013

[31] Y Tang R Salakhutdinov andGHHinton ldquoDeep Lambertiannetworksrdquo inProceedings of the 29th International Conference onMachine Learning (ICML 2012) pp 1623ndash1630 Edinburgh UKJuly 2012

[32] Q V Le A Karpenko J Ngiam and A Y Ng ldquoICA withreconstruction cost for efficient overcomplete feature learningrdquoin Proceedings of the 25th Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo11) pp 1017ndash1025 GranadaSpain December 2011

14 Mathematical Problems in Engineering

[33] X Glorot A Bordes and Y Bengio ldquoDeep sparse rectifier neu-ral networksrdquo in Proceedings of the 14th International Conferenceon Artificial Intelligence and Statistics pp 315ndash323 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article Adaptive Deep Supervised Autoencoder ...Adaptive Deep Supervised Network Template (ADSNT). the deep network perform well, similar to [], we need to giveitinitializationweights.en,thepreinitializedADSNT

Mathematical Problems in Engineering 7

(a)

(b)

Figure 4 Some original images fromARdatabase and the reconstructed ones (a) Original face images (corrupted faces) (b) Reconstructedfaces near frontal pose and normal illumination as galleries andrandomly choose half the number of images from the rest ofthe images of each person as probes The remaining imagescompose the testing set42 Experimental Settings In all the experiments the facialimages from the AR PubFig and Extended Yale B databasesare automatically detected using OpenCV face detector [30]After that we normalize the detected facial images (inorientation and scale) such that two eyes can be alignedat the same location Then the face areas are cropped andconverted to 256 gray levels images The size of each croppedimage is 26 times 30 pixels Thus the dimensionality of the inputvector is 780 Figure 6 presents an example fromAR databaseand the corresponding cropped image Each cropped facialimage is further preprocessed with histogram equalizationto minimize illumination variations We train our ADSNTmodel with 3 hidden layers where the number of hidden

nodes for these layers is empirically set as [1024 rarr 500 rarr120] because our experiments show that three hidden layerscan get a sufficiently good performance (see Section 433)

In order to show the whole experimental process aboutparameters setting we initially use the hyperbolic tangentfunction as the nonlinear activation function and implementADSNT on AR We also choose the face images with neutralexpression frontal pose and normal illumination as galleriesand randomly select half the number of images from the restof the images of each person as probe images The remainingimages compose the testing setThemean identification ratesare recorded

Firstly we empirically set the parameter 120576 = 0001 andsparsity target 1205880 = 005 and fix the parameters 120582 = 05and 120593 = 01 in ADSNT to check the effect of 120574 on theidentification rate As illustrated in Figure 6(a) where 120574 =008 ADSNT recognition method gets the best performance

8 Mathematical Problems in Engineering

(a)

(b)

(c)

Figure 5 A fraction of samples from AR PubFig and Extended Yale B face databases (a) AR (b) Extended Yale B and (c) PubFig

Then according to Figure 6(a) we fix the parameters 120574 = 008and 120582 = 05 in ADSNT to check the influence of 120593 Asshowed in Figure 6(b) when 120593 = 06 our method achievesthe best recognition rate At last we fix 120574 = 008 and 120593 = 06and the recognition rates are illustrated in Figure 6(c) withdifferent value of 120582 When 120582 = 3 the recognition rate is thehighest From the plot in Figure 6 one can observe that theparameters 120582 120593 and 120574 cannot be too large or too small If120582 is too large the ADSNT would be less discriminative ofdifferent subjects because it implements too strong similaritypreservation entry But if 120582 is too small it will degrade therecognition performance and the significance of similaritypreservation entry Similarly 120574 can also not be too large orthe hidden neurons will not be activated for a given input

and low recognition rate will be achieved If 120574 is too smallwe can get poor performance For the weight decay 120593 if itis too small the values of weights for all hidden units willchange very slightly On the contrary the values of weightswill change greatly

Using above those experiments we gain the optimalparameter values used in ADSNT as 120582 = 3 120593 = 06 and120574 = 008 on AR database The similar experiments also havebeen performed on Extended Yale B and PubFig databasesWe can get the parameters setting as 120582 = 26 120593 = 05 and120574 = 006 on Extended Yale B database and 120582 = 28 120593 = 052and 120574 = 009 on PubFig database

In the experiments we use two measures includingthe mean identification accuracy 120583 with standard deviation

Mathematical Problems in Engineering 9

Iden

tifica

tion

rate

()

120582 = 05120593 = 01

76

80

84

88

92

01

011

005

006

007

008

009

012

013

014

000

5

000

1

000

01

The parameter 120574

(a)

Iden

tifica

tion

rate

()

95

90

85

80

75

120582 = 05

1 2

08

07

010

02

03

04

05

06

09

22

004

002

008

000

9

000

6

The parameter 120593

120574 = 008

(b)

Iden

tifica

tion

rate

()

The parameter 120582

120593 = 06120574 = 008

80

85

90

55

50

45

40

35

3025

20

15

10

05

00

60

(c)

Figure 6 Parameters setting

](120583 plusmn ]) and the receiving operating characteristic (ROC)curves to validate the effectiveness of our method as well asother methods

43 Experimental Results and Analysis

431 Comparison with Different Methods In the followingexperiments on the three databases we compare the pro-posed approach with several recently proposed methodsThese compared methods include DAE with 10 randommask noises [12] marginalized DAE (MDAE) [17] Constrac-tive Autoencoders (CAE) [15] Deep Lambertian Networks(DLN) [31] stacked supervised autoencoder (SSAE) [9] ICA-Reconstruction (RICA) [32] and Template Deep Recon-struction Model (TDRM) [20] We use the implementationof these algorithms that are provided by the respectiveauthors For all the compared approaches we use the defaultparameters that are recommended in the correspondingpapers

The mean identification accuracy with standard devia-tions of different approaches on three databases is shownin Table 1 The ROC curves of different approaches areillustrated in Figure 7 The results imply that our approach

Table 1 Comparisons of the average identification accuracyand standard deviation () of different approaches on differentdatabases

Method AR Extended Yale B PubFigDAE [12] 5756 plusmn 02 6345 plusmn 13 6133 plusmn 15MDAE [17] 6780 plusmn 13 7156 plusmn 16 7055 plusmn 25CAE [15] 4950 plusmn 21 5572 plusmn 08 6856 plusmn 16DLN [31] NA 8150 plusmn 14 7760 plusmn 14SSAE [9] 8521 plusmn 07 8222 plusmn 03 8404 plusmn 12RICA [32] 7633 plusmn 17 7044 plusmn 13 7235 plusmn 15TDRM [20] 8770 plusmn 06 8642 plusmn 12 8990 plusmn 09Our method 9232 plusmn 07 9366 plusmn 04 9126 plusmn 16significantly outperforms other methods and gets the bestmean recognition rates for the same setting of trainingand testing sets Compared to those unsupervised deeplearning methods such as DAE MDAE CAE DLN andTDRM the improvement of our method is over 30 onExtended Yale B and AR databases where there is a littlepose variance On the PubFig database our approach canalso achieve the mean identification rate of 9126 plusmn 16

10 Mathematical Problems in Engineering

True

pos

itive

rate

04

05

06

07

08

09

10

02 03 04 05 06 07 08 09 1001

False positive rate

DAEMDAECAEDLN

SSAERICATDRMOur method

(a)

01 02 03 04 05 06 07 08 09 1000

False positive rate

07

08

09

10

True

pos

itive

rate

DAEMDAECAESSAE

RICATDRMOur method

(b)

050

055

060

065

070

075

080

085

090

095

100

True

pos

itive

rate

02 03 04 05 06 07 08 09 1001

False positive rate

DAEMDAECAEDLN

SSAERICATDRMOur method

(c)

Figure 7 Comparisons of ROC curves between our method and other methods on different databases (a) AR (b) Extended Yale B and (c)PubFig

and outperforms all compared methods The reason is thatourmethod can extract discriminative robust information tovariances (expression illumination pose etc) in the learneddeep networks Compared with a supervised method likeRICA the proposed method can improve over 16 19and 23 on AR PubFig and Extended Yale B databasesrespectively Our method is a deep learning method whichfocuses on the nonlinear classification problem with learninga nonlinear mapping such that more nonlinear discriminantinformation may be explored to enhance the identificationperformance Compared with SSAE method that is designedfor removing the variances such as illumination pose andpartial occlusion our method can still be better over 6

because of using the weight penalty terms GRBM to initializeweights and three layersrsquo similarity preservation term

432 Convergence Analysis In this subsection we evaluatedthe convergence of our ADSNT versus a different numberof iterations Figure 8 illustrates the value of the objectivefunction of ADSNT versus a different number of iterationson the AR PubFig and Extended Yale B databases FromFigure 8(a) one can observe that ADSNT converges in about55 28 and 70 iterations on the three databases respectively

We also implement the identification accuracy of ADSNTversus a different number of iterations on the AR PubFigand Extended Yale B databases Figure 8(b) plots the mean

Mathematical Problems in Engineering 11

Obj

ectiv

e fun

ctio

n va

lue

4

3

2

1

0

Iteration number0 30 60 90 120 150

ARExtended Yale BPubFig

(a)

ARExtended Yale BPubFig

Mea

n id

entifi

catio

n ra

te (

)

100

80

60

40

Iteration number20 40 60 80

(b)

Figure 8 Convergence analysis (a) Convergence curves of ADSNT on AR PubFig and Extended Yale B (b) Mean identification rate ()versus iterations of ADSNT on AR PubFig and Extended Yale B

Iden

tifica

tion

accu

racy

()

AR Extended Yale B PubFig0

10

20

30

40

50

60

70

80

90

100

Layer 1

Layer 2

Layer 3

Layer 4

Figure 9 The results of ADSNT with different network depth onthe different datasets

identification rate of ADSNT From Figure 8(b) one can alsoobserve that ADSNT achieves stable performance after about55 70 and 28 iterations on AR PubFig and Extended Yale Bdatabases respectively

433 The Effect of Network Depth In this subsection weconduct experiments on the three face datasets with differenthidden layer of our proposedADSNTnetworkThe proposedmethod achieves an identification rate of 923 plusmn 06 933 plusmn12 and 9122 plusmn 08 by three-hidden layer ADSNTnetwork that is 1024 rarr 500 rarr 120 respectively on ARExtended Yale B and PubFig datasets Figure 9 illustrates the

performance of different layer ADSNT One can observe thatthree-hidden layer network outperforms 2-layer networkand the result of 3-layer ADSNT network is very nearly equalto those of 4-layer network on AR and Extended Yale Bdatabases We also observe that the performance of 4-layernetwork is a bit lower than that of 3-layer network on thePubFig database In addition the deeper ADSNT networkis the more complex its computational complexity becomesTherefore the 3-layer network depth is a good trade-offbetween performance and computational complexity

434 Activation Function Following the work in [9] we alsoestimate the performance of ADSNTwith different activationfunctions such as sigmoid hyperbolic tangent and rectifiedlinear unit (ReLU) [33] which is defined as 119891(119909) = max(0 119909)When the sigmoid 119891(119909) = 1(1 + 119890minus119909) is used as activationfunction the objective function (see (6)) is rewritten asfollows

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

KL (120588119909 || 1205880) + 5sum119894

KL (120588 || 1205880))

(9)

where 120588119909 = (1119872)sum119894 119891(119909119894) 120588 = (1119872)sum119894 119891(119909119894)

12 Mathematical Problems in Engineering

Table 2 Comparisons of the ADSNT algorithm with differentactivation functions on the AR PubFig and Extended Yale Bdatabases

Dataset Sigmoid Tanh ReLUAR 8866 plusmn 14 9232 plusmn 07 9322 plusmn 15Extended Yale B 9055 plusmn 06 9366 plusmn 04 9454 plusmn 03PubFig 8740 plusmn 12 9126 plusmn 16 9244 plusmn 11

If ReLU is adopted as activation function (6) is formu-lated as

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171 + 5sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171)

(10)

Table 2 shows the performance of the proposed ADSNTbased on different activation functions conducted on thethree databases FromTable 2 one can see that ReLU achievesthe best performanceThe key reason is that we use theweightdecay term 120593 to optimize the objective function

435 Timing Consumption Analysis In this subsection weuse a HP Z620 workstation with Intel Xeon E5-2609 24GHzCPU 8G RAM and conduct a series of experiments onAR database to compare the time consumption of differentmethods which are tabulated in Table 3 The training time(seconds) is shown in Table 3(a) while the time (seconds)needed to recognize a face from the testing set is shownin Table 3(b) From Table 3 one can see that the proposedmethod requires comparatively more time for trainingbecause of initialization of ADSNT and performing imagereconstruction However the procedure of training is offlineWhen we identity an image from testing set our methodrequires less time than other methods

5 Conclusions

In this article we present an adaptive deep supervisedautoencoder based image reconstruction method for facerecognition Unlike conventional deep autoencoder basedface recognition method our method considers the classlabel information from training samples in the deep learn-ing procedure and can automatically discover the under-lying nonlinear manifold structures Specifically a multi-layer supervised adaptive network structure is presentedwhich is trained to extract characteristic features from cor-ruptedclean facial images and reconstruct the correspondingsimilar facial images The reconstruction is realized by a so-called ldquobottleneckrdquo neural network that learns to map face

Table 3 (a) Training time (seconds) for different methods (b)Testing time (seconds) for different methodsThe proposed methodcosts the least amount of testing time comparing with othermethods

(a)

Methods TimeDAE [12] 861MDAE [17] 1054CAE [15] 786DLN [31] 2343SSAE [9] 5351RICA [32] 1344TDRM [20] 1102Our method 12232

(b)

Methods TimeDAE [12] 027MDAE [17] 03CAE [15] 026DLN [31] 035SSAE [9] 022RICA [32] 019TDRM [20] 018Our method 013

images into a low-dimensional vector and to reconstructthe respective corresponding face images from the mappingvectors Having trained the ADSNT a new face image canthen be recognized by comparing its reconstruction imagewith individual gallery images during testing The proposedmethod has been evaluated on the widely used AR PubFigand Extended Yale B databases and the experimental resultshave shown its effectiveness For future work we are focusingon applying our proposed method to other application fieldssuch as pattern classification based on image set and actionrecognition based on the video to further demonstrate itsvalidity

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is partially supported by the research grant for theNatural Science Foundation from Sichuan Provincial Depart-ment of Education (Grant no 13ZB0336) and the NationalNatural Science Foundation of China (Grant no 61502059)

References

[1] J-T Chien and C-CWu ldquoDiscriminant waveletfaces and near-est feature classifiers for face recognitionrdquo IEEE Transactions on

Mathematical Problems in Engineering 13

Pattern Analysis and Machine Intelligence vol 24 no 12 pp1644ndash1649 2002

[2] Z Lei M Pietikainen and S Z Li ldquoLearning discriminant facedescriptorrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 36 no 2 pp 289ndash302 2014

[3] B Moghaddam T Jebara and A Pentland ldquoBayesian facerecognitionrdquo Pattern Recognition vol 33 no 11 pp 1771ndash17822000

[4] N Cristianini and J S Taylor An Introduction to SupportVector Machines and Other Kernel-based Learning MethodsCambridge University Press New York NY USA 2004

[5] R O Duda P E Hart and D G Stork Pattern ClassificationJohn Wiley amp Sons New York NY USA 2nd edition 2001

[6] W Zhao R Chellappa P J Phillips and A Rosenfeld ldquoFacerecognition a literature surveyrdquo ACM Computing Surveys vol35 no 4 pp 399ndash458 2003

[7] B Zhang S Shan X Chen and W Gao ldquoHistogram of Gaborphase patterns (HGPP) a novel object representation approachfor face recognitionrdquo IEEE Transactions on Image Processingvol 16 no 1 pp 57ndash68 2007

[8] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inNeural Information Processing Systems pp 1527ndash1554 2012

[9] S Gao Y Zhang K Jia J Lu and Y Zhang ldquoSingle sample facerecognition via learning deep supervised autoencodersrdquo IEEETransactions on Information Forensics and Security vol 10 no10 pp 2108ndash2118 2015

[10] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo American Associationfor the Advancement of Science Science vol 313 no 5786 pp504ndash507 2006

[11] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesmrdquo in Neural Networks Tricks ofthe Trade pp 437ndash478 Springer Berlin Germany 2012

[12] P Vincent H Larochelle I Lajoie Y Bengio and P-AManzagol ldquoStacked denoising autoencoders learning usefulrepresentations in a deep network with a local denoisingcriterionrdquo Journal of Machine Learning Research (JMLR) vol 11no 5 pp 3371ndash3408 2010

[13] V Nair and G E Hinton ldquoRectified linear units improveRestricted Boltzmann machinesrdquo in Proceedings of the 27thInternational Conference on Machine Learning (ICML rsquo10) pp807ndash814 Haifa Israel June 2010

[14] A Coates H Lee and A Y Ng ldquoAn analysis of single-layernetworks in unsupervised feature learningrdquo in Proceedings ofthe 14th International Conference on Artificial Intelligence andStatistics (AISTATS rsquo11) pp 215ndash223 Sardinia Italy 2010

[15] S Rifai P Vincent X Muller X Glorot and Y BengioldquoContractive auto-encoders explicit invariance during featureextractionrdquo in Proceedings of the 28th International Conferenceon Machine Learning (ICML rsquo11) pp 833ndash840 Bellevue WashUSA July 2011

[16] K Simonyan A Vedaldi and A Zisserman ldquoDeep fishernetworks for large-scale image classificationrdquo in Proceedings ofthe 27th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo13) pp 163ndash171 Lake Tahoe Nev USA Decem-ber 2013

[17] M Chen Z Xu K Q Weinberger and F Sha ldquoMarginalizeddenoising autoencoders for domain adaptationrdquo in Proceedingsof the 29th International Conference onMachine Learning (ICMLrsquo12) pp 767ndash774 Edinburgh UK July 2012

[18] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquo inProceedings of the 27th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 June 2014

[19] Z Zhu P Luo X Wang and X Tang ldquoDeep learning identity-preserving face spacerdquo in Proceedings of the 14th IEEE Interna-tional Conference on Computer Vision (ICCV rsquo13) pp 113ndash120Sydney Australia December 2013

[20] M Hayat M Bennamoun and S An ldquoDeep reconstructionmodels for image set classificationrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 37 no 4 pp 713ndash727 2015

[21] Y Sun X Wang and X Tang ldquoDeep learning face representa-tion from predicting 10000 classesrdquo in Proceedings of the 27thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo14) pp 1891ndash1898 Columbus Ohio USA June 2014

[22] Y Sun X Wang and X Tang ldquoDeep learning face rep-resentation by joint identification-verificationrdquo Tech Rephttpsarxivorgabs14064773

[23] X Cai CWang B Xiao X Chen and J Zhou ldquoDeep nonlinearmetric learning with independent subspace analysis for faceverificationrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 749ndash752 November2012

[24] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[25] Q V Le J Ngiam A Coates A Lahiri B Prochnow andA Y Ng ldquoOn optimization methods for deep learningrdquo inProceedings of the 28th International Conference on MachineLearning (ICML rsquo11) pp 265ndash272 Bellevue Wash USA July2011

[26] C Zhou X Wei Q Zhang and X Fang ldquoFisherrsquos lineardiscriminant (FLD) and support vectormachine (SVM) in non-negative matrix factorization (NMF) residual space for facerecognitionrdquoOptica Applicata vol 40 no 3 pp 693ndash704 2010

[27] A Martinez and R Benavente ldquoThe AR face databaserdquo CVCTech Rep 24 1998

[28] A S Georghiades P N Belhumeur and D J Kriegman ldquoFromfew to many illumination cone models for face recognitionunder variable lighting and poserdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 23 no 6 pp 643ndash6602001

[29] N Kumar A C Berg P N Belhumeur and S K NayarldquoAttribute and simile classifiers for face verificationrdquo in Proceed-ings of the 12th International Conference on Computer Vision(ICCV rsquo09) pp 365ndash372 Kyoto Japan October 2009

[30] M Rezaei and R Klette ldquoNovel adaptive eye detection andtracking for challenging lighting conditionsrdquo in ComputerVisionmdashACCV 2012 Workshops J-I Park and J Kim Edsvol 7729 of Lecture Notes in Computer Science pp 427ndash440Springer Berlin Germany 2013

[31] Y Tang R Salakhutdinov andGHHinton ldquoDeep Lambertiannetworksrdquo inProceedings of the 29th International Conference onMachine Learning (ICML 2012) pp 1623ndash1630 Edinburgh UKJuly 2012

[32] Q V Le A Karpenko J Ngiam and A Y Ng ldquoICA withreconstruction cost for efficient overcomplete feature learningrdquoin Proceedings of the 25th Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo11) pp 1017ndash1025 GranadaSpain December 2011

14 Mathematical Problems in Engineering

[33] X Glorot A Bordes and Y Bengio ldquoDeep sparse rectifier neu-ral networksrdquo in Proceedings of the 14th International Conferenceon Artificial Intelligence and Statistics pp 315ndash323 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article Adaptive Deep Supervised Autoencoder ...Adaptive Deep Supervised Network Template (ADSNT). the deep network perform well, similar to [], we need to giveitinitializationweights.en,thepreinitializedADSNT

8 Mathematical Problems in Engineering

(a)

(b)

(c)

Figure 5 A fraction of samples from AR PubFig and Extended Yale B face databases (a) AR (b) Extended Yale B and (c) PubFig

Then according to Figure 6(a) we fix the parameters 120574 = 008and 120582 = 05 in ADSNT to check the influence of 120593 Asshowed in Figure 6(b) when 120593 = 06 our method achievesthe best recognition rate At last we fix 120574 = 008 and 120593 = 06and the recognition rates are illustrated in Figure 6(c) withdifferent value of 120582 When 120582 = 3 the recognition rate is thehighest From the plot in Figure 6 one can observe that theparameters 120582 120593 and 120574 cannot be too large or too small If120582 is too large the ADSNT would be less discriminative ofdifferent subjects because it implements too strong similaritypreservation entry But if 120582 is too small it will degrade therecognition performance and the significance of similaritypreservation entry Similarly 120574 can also not be too large orthe hidden neurons will not be activated for a given input

and low recognition rate will be achieved If 120574 is too smallwe can get poor performance For the weight decay 120593 if itis too small the values of weights for all hidden units willchange very slightly On the contrary the values of weightswill change greatly

Using above those experiments we gain the optimalparameter values used in ADSNT as 120582 = 3 120593 = 06 and120574 = 008 on AR database The similar experiments also havebeen performed on Extended Yale B and PubFig databasesWe can get the parameters setting as 120582 = 26 120593 = 05 and120574 = 006 on Extended Yale B database and 120582 = 28 120593 = 052and 120574 = 009 on PubFig database

In the experiments we use two measures includingthe mean identification accuracy 120583 with standard deviation

Mathematical Problems in Engineering 9

Iden

tifica

tion

rate

()

120582 = 05120593 = 01

76

80

84

88

92

01

011

005

006

007

008

009

012

013

014

000

5

000

1

000

01

The parameter 120574

(a)

Iden

tifica

tion

rate

()

95

90

85

80

75

120582 = 05

1 2

08

07

010

02

03

04

05

06

09

22

004

002

008

000

9

000

6

The parameter 120593

120574 = 008

(b)

Iden

tifica

tion

rate

()

The parameter 120582

120593 = 06120574 = 008

80

85

90

55

50

45

40

35

3025

20

15

10

05

00

60

(c)

Figure 6 Parameters setting

](120583 plusmn ]) and the receiving operating characteristic (ROC)curves to validate the effectiveness of our method as well asother methods

43 Experimental Results and Analysis

431 Comparison with Different Methods In the followingexperiments on the three databases we compare the pro-posed approach with several recently proposed methodsThese compared methods include DAE with 10 randommask noises [12] marginalized DAE (MDAE) [17] Constrac-tive Autoencoders (CAE) [15] Deep Lambertian Networks(DLN) [31] stacked supervised autoencoder (SSAE) [9] ICA-Reconstruction (RICA) [32] and Template Deep Recon-struction Model (TDRM) [20] We use the implementationof these algorithms that are provided by the respectiveauthors For all the compared approaches we use the defaultparameters that are recommended in the correspondingpapers

The mean identification accuracy with standard devia-tions of different approaches on three databases is shownin Table 1 The ROC curves of different approaches areillustrated in Figure 7 The results imply that our approach

Table 1 Comparisons of the average identification accuracyand standard deviation () of different approaches on differentdatabases

Method AR Extended Yale B PubFigDAE [12] 5756 plusmn 02 6345 plusmn 13 6133 plusmn 15MDAE [17] 6780 plusmn 13 7156 plusmn 16 7055 plusmn 25CAE [15] 4950 plusmn 21 5572 plusmn 08 6856 plusmn 16DLN [31] NA 8150 plusmn 14 7760 plusmn 14SSAE [9] 8521 plusmn 07 8222 plusmn 03 8404 plusmn 12RICA [32] 7633 plusmn 17 7044 plusmn 13 7235 plusmn 15TDRM [20] 8770 plusmn 06 8642 plusmn 12 8990 plusmn 09Our method 9232 plusmn 07 9366 plusmn 04 9126 plusmn 16significantly outperforms other methods and gets the bestmean recognition rates for the same setting of trainingand testing sets Compared to those unsupervised deeplearning methods such as DAE MDAE CAE DLN andTDRM the improvement of our method is over 30 onExtended Yale B and AR databases where there is a littlepose variance On the PubFig database our approach canalso achieve the mean identification rate of 9126 plusmn 16

10 Mathematical Problems in Engineering

True

pos

itive

rate

04

05

06

07

08

09

10

02 03 04 05 06 07 08 09 1001

False positive rate

DAEMDAECAEDLN

SSAERICATDRMOur method

(a)

01 02 03 04 05 06 07 08 09 1000

False positive rate

07

08

09

10

True

pos

itive

rate

DAEMDAECAESSAE

RICATDRMOur method

(b)

050

055

060

065

070

075

080

085

090

095

100

True

pos

itive

rate

02 03 04 05 06 07 08 09 1001

False positive rate

DAEMDAECAEDLN

SSAERICATDRMOur method

(c)

Figure 7 Comparisons of ROC curves between our method and other methods on different databases (a) AR (b) Extended Yale B and (c)PubFig

and outperforms all compared methods The reason is thatourmethod can extract discriminative robust information tovariances (expression illumination pose etc) in the learneddeep networks Compared with a supervised method likeRICA the proposed method can improve over 16 19and 23 on AR PubFig and Extended Yale B databasesrespectively Our method is a deep learning method whichfocuses on the nonlinear classification problem with learninga nonlinear mapping such that more nonlinear discriminantinformation may be explored to enhance the identificationperformance Compared with SSAE method that is designedfor removing the variances such as illumination pose andpartial occlusion our method can still be better over 6

because of using the weight penalty terms GRBM to initializeweights and three layersrsquo similarity preservation term

432 Convergence Analysis In this subsection we evaluatedthe convergence of our ADSNT versus a different numberof iterations Figure 8 illustrates the value of the objectivefunction of ADSNT versus a different number of iterationson the AR PubFig and Extended Yale B databases FromFigure 8(a) one can observe that ADSNT converges in about55 28 and 70 iterations on the three databases respectively

We also implement the identification accuracy of ADSNTversus a different number of iterations on the AR PubFigand Extended Yale B databases Figure 8(b) plots the mean

Mathematical Problems in Engineering 11

Obj

ectiv

e fun

ctio

n va

lue

4

3

2

1

0

Iteration number0 30 60 90 120 150

ARExtended Yale BPubFig

(a)

ARExtended Yale BPubFig

Mea

n id

entifi

catio

n ra

te (

)

100

80

60

40

Iteration number20 40 60 80

(b)

Figure 8 Convergence analysis (a) Convergence curves of ADSNT on AR PubFig and Extended Yale B (b) Mean identification rate ()versus iterations of ADSNT on AR PubFig and Extended Yale B

Iden

tifica

tion

accu

racy

()

AR Extended Yale B PubFig0

10

20

30

40

50

60

70

80

90

100

Layer 1

Layer 2

Layer 3

Layer 4

Figure 9 The results of ADSNT with different network depth onthe different datasets

identification rate of ADSNT From Figure 8(b) one can alsoobserve that ADSNT achieves stable performance after about55 70 and 28 iterations on AR PubFig and Extended Yale Bdatabases respectively

433 The Effect of Network Depth In this subsection weconduct experiments on the three face datasets with differenthidden layer of our proposedADSNTnetworkThe proposedmethod achieves an identification rate of 923 plusmn 06 933 plusmn12 and 9122 plusmn 08 by three-hidden layer ADSNTnetwork that is 1024 rarr 500 rarr 120 respectively on ARExtended Yale B and PubFig datasets Figure 9 illustrates the

performance of different layer ADSNT One can observe thatthree-hidden layer network outperforms 2-layer networkand the result of 3-layer ADSNT network is very nearly equalto those of 4-layer network on AR and Extended Yale Bdatabases We also observe that the performance of 4-layernetwork is a bit lower than that of 3-layer network on thePubFig database In addition the deeper ADSNT networkis the more complex its computational complexity becomesTherefore the 3-layer network depth is a good trade-offbetween performance and computational complexity

434 Activation Function Following the work in [9] we alsoestimate the performance of ADSNTwith different activationfunctions such as sigmoid hyperbolic tangent and rectifiedlinear unit (ReLU) [33] which is defined as 119891(119909) = max(0 119909)When the sigmoid 119891(119909) = 1(1 + 119890minus119909) is used as activationfunction the objective function (see (6)) is rewritten asfollows

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

KL (120588119909 || 1205880) + 5sum119894

KL (120588 || 1205880))

(9)

where 120588119909 = (1119872)sum119894 119891(119909119894) 120588 = (1119872)sum119894 119891(119909119894)

12 Mathematical Problems in Engineering

Table 2 Comparisons of the ADSNT algorithm with differentactivation functions on the AR PubFig and Extended Yale Bdatabases

Dataset Sigmoid Tanh ReLUAR 8866 plusmn 14 9232 plusmn 07 9322 plusmn 15Extended Yale B 9055 plusmn 06 9366 plusmn 04 9454 plusmn 03PubFig 8740 plusmn 12 9126 plusmn 16 9244 plusmn 11

If ReLU is adopted as activation function (6) is formu-lated as

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171 + 5sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171)

(10)

Table 2 shows the performance of the proposed ADSNTbased on different activation functions conducted on thethree databases FromTable 2 one can see that ReLU achievesthe best performanceThe key reason is that we use theweightdecay term 120593 to optimize the objective function

435 Timing Consumption Analysis In this subsection weuse a HP Z620 workstation with Intel Xeon E5-2609 24GHzCPU 8G RAM and conduct a series of experiments onAR database to compare the time consumption of differentmethods which are tabulated in Table 3 The training time(seconds) is shown in Table 3(a) while the time (seconds)needed to recognize a face from the testing set is shownin Table 3(b) From Table 3 one can see that the proposedmethod requires comparatively more time for trainingbecause of initialization of ADSNT and performing imagereconstruction However the procedure of training is offlineWhen we identity an image from testing set our methodrequires less time than other methods

5 Conclusions

In this article we present an adaptive deep supervisedautoencoder based image reconstruction method for facerecognition Unlike conventional deep autoencoder basedface recognition method our method considers the classlabel information from training samples in the deep learn-ing procedure and can automatically discover the under-lying nonlinear manifold structures Specifically a multi-layer supervised adaptive network structure is presentedwhich is trained to extract characteristic features from cor-ruptedclean facial images and reconstruct the correspondingsimilar facial images The reconstruction is realized by a so-called ldquobottleneckrdquo neural network that learns to map face

Table 3 (a) Training time (seconds) for different methods (b)Testing time (seconds) for different methodsThe proposed methodcosts the least amount of testing time comparing with othermethods

(a)

Methods TimeDAE [12] 861MDAE [17] 1054CAE [15] 786DLN [31] 2343SSAE [9] 5351RICA [32] 1344TDRM [20] 1102Our method 12232

(b)

Methods TimeDAE [12] 027MDAE [17] 03CAE [15] 026DLN [31] 035SSAE [9] 022RICA [32] 019TDRM [20] 018Our method 013

images into a low-dimensional vector and to reconstructthe respective corresponding face images from the mappingvectors Having trained the ADSNT a new face image canthen be recognized by comparing its reconstruction imagewith individual gallery images during testing The proposedmethod has been evaluated on the widely used AR PubFigand Extended Yale B databases and the experimental resultshave shown its effectiveness For future work we are focusingon applying our proposed method to other application fieldssuch as pattern classification based on image set and actionrecognition based on the video to further demonstrate itsvalidity

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is partially supported by the research grant for theNatural Science Foundation from Sichuan Provincial Depart-ment of Education (Grant no 13ZB0336) and the NationalNatural Science Foundation of China (Grant no 61502059)

References

[1] J-T Chien and C-CWu ldquoDiscriminant waveletfaces and near-est feature classifiers for face recognitionrdquo IEEE Transactions on

Mathematical Problems in Engineering 13

Pattern Analysis and Machine Intelligence vol 24 no 12 pp1644ndash1649 2002

[2] Z Lei M Pietikainen and S Z Li ldquoLearning discriminant facedescriptorrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 36 no 2 pp 289ndash302 2014

[3] B Moghaddam T Jebara and A Pentland ldquoBayesian facerecognitionrdquo Pattern Recognition vol 33 no 11 pp 1771ndash17822000

[4] N Cristianini and J S Taylor An Introduction to SupportVector Machines and Other Kernel-based Learning MethodsCambridge University Press New York NY USA 2004

[5] R O Duda P E Hart and D G Stork Pattern ClassificationJohn Wiley amp Sons New York NY USA 2nd edition 2001

[6] W Zhao R Chellappa P J Phillips and A Rosenfeld ldquoFacerecognition a literature surveyrdquo ACM Computing Surveys vol35 no 4 pp 399ndash458 2003

[7] B Zhang S Shan X Chen and W Gao ldquoHistogram of Gaborphase patterns (HGPP) a novel object representation approachfor face recognitionrdquo IEEE Transactions on Image Processingvol 16 no 1 pp 57ndash68 2007

[8] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inNeural Information Processing Systems pp 1527ndash1554 2012

[9] S Gao Y Zhang K Jia J Lu and Y Zhang ldquoSingle sample facerecognition via learning deep supervised autoencodersrdquo IEEETransactions on Information Forensics and Security vol 10 no10 pp 2108ndash2118 2015

[10] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo American Associationfor the Advancement of Science Science vol 313 no 5786 pp504ndash507 2006

[11] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesmrdquo in Neural Networks Tricks ofthe Trade pp 437ndash478 Springer Berlin Germany 2012

[12] P Vincent H Larochelle I Lajoie Y Bengio and P-AManzagol ldquoStacked denoising autoencoders learning usefulrepresentations in a deep network with a local denoisingcriterionrdquo Journal of Machine Learning Research (JMLR) vol 11no 5 pp 3371ndash3408 2010

[13] V Nair and G E Hinton ldquoRectified linear units improveRestricted Boltzmann machinesrdquo in Proceedings of the 27thInternational Conference on Machine Learning (ICML rsquo10) pp807ndash814 Haifa Israel June 2010

[14] A Coates H Lee and A Y Ng ldquoAn analysis of single-layernetworks in unsupervised feature learningrdquo in Proceedings ofthe 14th International Conference on Artificial Intelligence andStatistics (AISTATS rsquo11) pp 215ndash223 Sardinia Italy 2010

[15] S Rifai P Vincent X Muller X Glorot and Y BengioldquoContractive auto-encoders explicit invariance during featureextractionrdquo in Proceedings of the 28th International Conferenceon Machine Learning (ICML rsquo11) pp 833ndash840 Bellevue WashUSA July 2011

[16] K Simonyan A Vedaldi and A Zisserman ldquoDeep fishernetworks for large-scale image classificationrdquo in Proceedings ofthe 27th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo13) pp 163ndash171 Lake Tahoe Nev USA Decem-ber 2013

[17] M Chen Z Xu K Q Weinberger and F Sha ldquoMarginalizeddenoising autoencoders for domain adaptationrdquo in Proceedingsof the 29th International Conference onMachine Learning (ICMLrsquo12) pp 767ndash774 Edinburgh UK July 2012

[18] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquo inProceedings of the 27th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 June 2014

[19] Z Zhu P Luo X Wang and X Tang ldquoDeep learning identity-preserving face spacerdquo in Proceedings of the 14th IEEE Interna-tional Conference on Computer Vision (ICCV rsquo13) pp 113ndash120Sydney Australia December 2013

[20] M Hayat M Bennamoun and S An ldquoDeep reconstructionmodels for image set classificationrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 37 no 4 pp 713ndash727 2015

[21] Y Sun X Wang and X Tang ldquoDeep learning face representa-tion from predicting 10000 classesrdquo in Proceedings of the 27thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo14) pp 1891ndash1898 Columbus Ohio USA June 2014

[22] Y Sun X Wang and X Tang ldquoDeep learning face rep-resentation by joint identification-verificationrdquo Tech Rephttpsarxivorgabs14064773

[23] X Cai CWang B Xiao X Chen and J Zhou ldquoDeep nonlinearmetric learning with independent subspace analysis for faceverificationrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 749ndash752 November2012

[24] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[25] Q V Le J Ngiam A Coates A Lahiri B Prochnow andA Y Ng ldquoOn optimization methods for deep learningrdquo inProceedings of the 28th International Conference on MachineLearning (ICML rsquo11) pp 265ndash272 Bellevue Wash USA July2011

[26] C Zhou X Wei Q Zhang and X Fang ldquoFisherrsquos lineardiscriminant (FLD) and support vectormachine (SVM) in non-negative matrix factorization (NMF) residual space for facerecognitionrdquoOptica Applicata vol 40 no 3 pp 693ndash704 2010

[27] A Martinez and R Benavente ldquoThe AR face databaserdquo CVCTech Rep 24 1998

[28] A S Georghiades P N Belhumeur and D J Kriegman ldquoFromfew to many illumination cone models for face recognitionunder variable lighting and poserdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 23 no 6 pp 643ndash6602001

[29] N Kumar A C Berg P N Belhumeur and S K NayarldquoAttribute and simile classifiers for face verificationrdquo in Proceed-ings of the 12th International Conference on Computer Vision(ICCV rsquo09) pp 365ndash372 Kyoto Japan October 2009

[30] M Rezaei and R Klette ldquoNovel adaptive eye detection andtracking for challenging lighting conditionsrdquo in ComputerVisionmdashACCV 2012 Workshops J-I Park and J Kim Edsvol 7729 of Lecture Notes in Computer Science pp 427ndash440Springer Berlin Germany 2013

[31] Y Tang R Salakhutdinov andGHHinton ldquoDeep Lambertiannetworksrdquo inProceedings of the 29th International Conference onMachine Learning (ICML 2012) pp 1623ndash1630 Edinburgh UKJuly 2012

[32] Q V Le A Karpenko J Ngiam and A Y Ng ldquoICA withreconstruction cost for efficient overcomplete feature learningrdquoin Proceedings of the 25th Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo11) pp 1017ndash1025 GranadaSpain December 2011

14 Mathematical Problems in Engineering

[33] X Glorot A Bordes and Y Bengio ldquoDeep sparse rectifier neu-ral networksrdquo in Proceedings of the 14th International Conferenceon Artificial Intelligence and Statistics pp 315ndash323 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article Adaptive Deep Supervised Autoencoder ...Adaptive Deep Supervised Network Template (ADSNT). the deep network perform well, similar to [], we need to giveitinitializationweights.en,thepreinitializedADSNT

Mathematical Problems in Engineering 9

Iden

tifica

tion

rate

()

120582 = 05120593 = 01

76

80

84

88

92

01

011

005

006

007

008

009

012

013

014

000

5

000

1

000

01

The parameter 120574

(a)

Iden

tifica

tion

rate

()

95

90

85

80

75

120582 = 05

1 2

08

07

010

02

03

04

05

06

09

22

004

002

008

000

9

000

6

The parameter 120593

120574 = 008

(b)

Iden

tifica

tion

rate

()

The parameter 120582

120593 = 06120574 = 008

80

85

90

55

50

45

40

35

3025

20

15

10

05

00

60

(c)

Figure 6 Parameters setting

](120583 plusmn ]) and the receiving operating characteristic (ROC)curves to validate the effectiveness of our method as well asother methods

43 Experimental Results and Analysis

431 Comparison with Different Methods In the followingexperiments on the three databases we compare the pro-posed approach with several recently proposed methodsThese compared methods include DAE with 10 randommask noises [12] marginalized DAE (MDAE) [17] Constrac-tive Autoencoders (CAE) [15] Deep Lambertian Networks(DLN) [31] stacked supervised autoencoder (SSAE) [9] ICA-Reconstruction (RICA) [32] and Template Deep Recon-struction Model (TDRM) [20] We use the implementationof these algorithms that are provided by the respectiveauthors For all the compared approaches we use the defaultparameters that are recommended in the correspondingpapers

The mean identification accuracy with standard devia-tions of different approaches on three databases is shownin Table 1 The ROC curves of different approaches areillustrated in Figure 7 The results imply that our approach

Table 1 Comparisons of the average identification accuracyand standard deviation () of different approaches on differentdatabases

Method AR Extended Yale B PubFigDAE [12] 5756 plusmn 02 6345 plusmn 13 6133 plusmn 15MDAE [17] 6780 plusmn 13 7156 plusmn 16 7055 plusmn 25CAE [15] 4950 plusmn 21 5572 plusmn 08 6856 plusmn 16DLN [31] NA 8150 plusmn 14 7760 plusmn 14SSAE [9] 8521 plusmn 07 8222 plusmn 03 8404 plusmn 12RICA [32] 7633 plusmn 17 7044 plusmn 13 7235 plusmn 15TDRM [20] 8770 plusmn 06 8642 plusmn 12 8990 plusmn 09Our method 9232 plusmn 07 9366 plusmn 04 9126 plusmn 16significantly outperforms other methods and gets the bestmean recognition rates for the same setting of trainingand testing sets Compared to those unsupervised deeplearning methods such as DAE MDAE CAE DLN andTDRM the improvement of our method is over 30 onExtended Yale B and AR databases where there is a littlepose variance On the PubFig database our approach canalso achieve the mean identification rate of 9126 plusmn 16

10 Mathematical Problems in Engineering

True

pos

itive

rate

04

05

06

07

08

09

10

02 03 04 05 06 07 08 09 1001

False positive rate

DAEMDAECAEDLN

SSAERICATDRMOur method

(a)

01 02 03 04 05 06 07 08 09 1000

False positive rate

07

08

09

10

True

pos

itive

rate

DAEMDAECAESSAE

RICATDRMOur method

(b)

050

055

060

065

070

075

080

085

090

095

100

True

pos

itive

rate

02 03 04 05 06 07 08 09 1001

False positive rate

DAEMDAECAEDLN

SSAERICATDRMOur method

(c)

Figure 7 Comparisons of ROC curves between our method and other methods on different databases (a) AR (b) Extended Yale B and (c)PubFig

and outperforms all compared methods The reason is thatourmethod can extract discriminative robust information tovariances (expression illumination pose etc) in the learneddeep networks Compared with a supervised method likeRICA the proposed method can improve over 16 19and 23 on AR PubFig and Extended Yale B databasesrespectively Our method is a deep learning method whichfocuses on the nonlinear classification problem with learninga nonlinear mapping such that more nonlinear discriminantinformation may be explored to enhance the identificationperformance Compared with SSAE method that is designedfor removing the variances such as illumination pose andpartial occlusion our method can still be better over 6

because of using the weight penalty terms GRBM to initializeweights and three layersrsquo similarity preservation term

432 Convergence Analysis In this subsection we evaluatedthe convergence of our ADSNT versus a different numberof iterations Figure 8 illustrates the value of the objectivefunction of ADSNT versus a different number of iterationson the AR PubFig and Extended Yale B databases FromFigure 8(a) one can observe that ADSNT converges in about55 28 and 70 iterations on the three databases respectively

We also implement the identification accuracy of ADSNTversus a different number of iterations on the AR PubFigand Extended Yale B databases Figure 8(b) plots the mean

Mathematical Problems in Engineering 11

Obj

ectiv

e fun

ctio

n va

lue

4

3

2

1

0

Iteration number0 30 60 90 120 150

ARExtended Yale BPubFig

(a)

ARExtended Yale BPubFig

Mea

n id

entifi

catio

n ra

te (

)

100

80

60

40

Iteration number20 40 60 80

(b)

Figure 8 Convergence analysis (a) Convergence curves of ADSNT on AR PubFig and Extended Yale B (b) Mean identification rate ()versus iterations of ADSNT on AR PubFig and Extended Yale B

Iden

tifica

tion

accu

racy

()

AR Extended Yale B PubFig0

10

20

30

40

50

60

70

80

90

100

Layer 1

Layer 2

Layer 3

Layer 4

Figure 9 The results of ADSNT with different network depth onthe different datasets

identification rate of ADSNT From Figure 8(b) one can alsoobserve that ADSNT achieves stable performance after about55 70 and 28 iterations on AR PubFig and Extended Yale Bdatabases respectively

433 The Effect of Network Depth In this subsection weconduct experiments on the three face datasets with differenthidden layer of our proposedADSNTnetworkThe proposedmethod achieves an identification rate of 923 plusmn 06 933 plusmn12 and 9122 plusmn 08 by three-hidden layer ADSNTnetwork that is 1024 rarr 500 rarr 120 respectively on ARExtended Yale B and PubFig datasets Figure 9 illustrates the

performance of different layer ADSNT One can observe thatthree-hidden layer network outperforms 2-layer networkand the result of 3-layer ADSNT network is very nearly equalto those of 4-layer network on AR and Extended Yale Bdatabases We also observe that the performance of 4-layernetwork is a bit lower than that of 3-layer network on thePubFig database In addition the deeper ADSNT networkis the more complex its computational complexity becomesTherefore the 3-layer network depth is a good trade-offbetween performance and computational complexity

434 Activation Function Following the work in [9] we alsoestimate the performance of ADSNTwith different activationfunctions such as sigmoid hyperbolic tangent and rectifiedlinear unit (ReLU) [33] which is defined as 119891(119909) = max(0 119909)When the sigmoid 119891(119909) = 1(1 + 119890minus119909) is used as activationfunction the objective function (see (6)) is rewritten asfollows

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

KL (120588119909 || 1205880) + 5sum119894

KL (120588 || 1205880))

(9)

where 120588119909 = (1119872)sum119894 119891(119909119894) 120588 = (1119872)sum119894 119891(119909119894)

12 Mathematical Problems in Engineering

Table 2 Comparisons of the ADSNT algorithm with differentactivation functions on the AR PubFig and Extended Yale Bdatabases

Dataset Sigmoid Tanh ReLUAR 8866 plusmn 14 9232 plusmn 07 9322 plusmn 15Extended Yale B 9055 plusmn 06 9366 plusmn 04 9454 plusmn 03PubFig 8740 plusmn 12 9126 plusmn 16 9244 plusmn 11

If ReLU is adopted as activation function (6) is formu-lated as

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171 + 5sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171)

(10)

Table 2 shows the performance of the proposed ADSNTbased on different activation functions conducted on thethree databases FromTable 2 one can see that ReLU achievesthe best performanceThe key reason is that we use theweightdecay term 120593 to optimize the objective function

435 Timing Consumption Analysis In this subsection weuse a HP Z620 workstation with Intel Xeon E5-2609 24GHzCPU 8G RAM and conduct a series of experiments onAR database to compare the time consumption of differentmethods which are tabulated in Table 3 The training time(seconds) is shown in Table 3(a) while the time (seconds)needed to recognize a face from the testing set is shownin Table 3(b) From Table 3 one can see that the proposedmethod requires comparatively more time for trainingbecause of initialization of ADSNT and performing imagereconstruction However the procedure of training is offlineWhen we identity an image from testing set our methodrequires less time than other methods

5 Conclusions

In this article we present an adaptive deep supervisedautoencoder based image reconstruction method for facerecognition Unlike conventional deep autoencoder basedface recognition method our method considers the classlabel information from training samples in the deep learn-ing procedure and can automatically discover the under-lying nonlinear manifold structures Specifically a multi-layer supervised adaptive network structure is presentedwhich is trained to extract characteristic features from cor-ruptedclean facial images and reconstruct the correspondingsimilar facial images The reconstruction is realized by a so-called ldquobottleneckrdquo neural network that learns to map face

Table 3 (a) Training time (seconds) for different methods (b)Testing time (seconds) for different methodsThe proposed methodcosts the least amount of testing time comparing with othermethods

(a)

Methods TimeDAE [12] 861MDAE [17] 1054CAE [15] 786DLN [31] 2343SSAE [9] 5351RICA [32] 1344TDRM [20] 1102Our method 12232

(b)

Methods TimeDAE [12] 027MDAE [17] 03CAE [15] 026DLN [31] 035SSAE [9] 022RICA [32] 019TDRM [20] 018Our method 013

images into a low-dimensional vector and to reconstructthe respective corresponding face images from the mappingvectors Having trained the ADSNT a new face image canthen be recognized by comparing its reconstruction imagewith individual gallery images during testing The proposedmethod has been evaluated on the widely used AR PubFigand Extended Yale B databases and the experimental resultshave shown its effectiveness For future work we are focusingon applying our proposed method to other application fieldssuch as pattern classification based on image set and actionrecognition based on the video to further demonstrate itsvalidity

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is partially supported by the research grant for theNatural Science Foundation from Sichuan Provincial Depart-ment of Education (Grant no 13ZB0336) and the NationalNatural Science Foundation of China (Grant no 61502059)

References

[1] J-T Chien and C-CWu ldquoDiscriminant waveletfaces and near-est feature classifiers for face recognitionrdquo IEEE Transactions on

Mathematical Problems in Engineering 13

Pattern Analysis and Machine Intelligence vol 24 no 12 pp1644ndash1649 2002

[2] Z Lei M Pietikainen and S Z Li ldquoLearning discriminant facedescriptorrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 36 no 2 pp 289ndash302 2014

[3] B Moghaddam T Jebara and A Pentland ldquoBayesian facerecognitionrdquo Pattern Recognition vol 33 no 11 pp 1771ndash17822000

[4] N Cristianini and J S Taylor An Introduction to SupportVector Machines and Other Kernel-based Learning MethodsCambridge University Press New York NY USA 2004

[5] R O Duda P E Hart and D G Stork Pattern ClassificationJohn Wiley amp Sons New York NY USA 2nd edition 2001

[6] W Zhao R Chellappa P J Phillips and A Rosenfeld ldquoFacerecognition a literature surveyrdquo ACM Computing Surveys vol35 no 4 pp 399ndash458 2003

[7] B Zhang S Shan X Chen and W Gao ldquoHistogram of Gaborphase patterns (HGPP) a novel object representation approachfor face recognitionrdquo IEEE Transactions on Image Processingvol 16 no 1 pp 57ndash68 2007

[8] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inNeural Information Processing Systems pp 1527ndash1554 2012

[9] S Gao Y Zhang K Jia J Lu and Y Zhang ldquoSingle sample facerecognition via learning deep supervised autoencodersrdquo IEEETransactions on Information Forensics and Security vol 10 no10 pp 2108ndash2118 2015

[10] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo American Associationfor the Advancement of Science Science vol 313 no 5786 pp504ndash507 2006

[11] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesmrdquo in Neural Networks Tricks ofthe Trade pp 437ndash478 Springer Berlin Germany 2012

[12] P Vincent H Larochelle I Lajoie Y Bengio and P-AManzagol ldquoStacked denoising autoencoders learning usefulrepresentations in a deep network with a local denoisingcriterionrdquo Journal of Machine Learning Research (JMLR) vol 11no 5 pp 3371ndash3408 2010

[13] V Nair and G E Hinton ldquoRectified linear units improveRestricted Boltzmann machinesrdquo in Proceedings of the 27thInternational Conference on Machine Learning (ICML rsquo10) pp807ndash814 Haifa Israel June 2010

[14] A Coates H Lee and A Y Ng ldquoAn analysis of single-layernetworks in unsupervised feature learningrdquo in Proceedings ofthe 14th International Conference on Artificial Intelligence andStatistics (AISTATS rsquo11) pp 215ndash223 Sardinia Italy 2010

[15] S Rifai P Vincent X Muller X Glorot and Y BengioldquoContractive auto-encoders explicit invariance during featureextractionrdquo in Proceedings of the 28th International Conferenceon Machine Learning (ICML rsquo11) pp 833ndash840 Bellevue WashUSA July 2011

[16] K Simonyan A Vedaldi and A Zisserman ldquoDeep fishernetworks for large-scale image classificationrdquo in Proceedings ofthe 27th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo13) pp 163ndash171 Lake Tahoe Nev USA Decem-ber 2013

[17] M Chen Z Xu K Q Weinberger and F Sha ldquoMarginalizeddenoising autoencoders for domain adaptationrdquo in Proceedingsof the 29th International Conference onMachine Learning (ICMLrsquo12) pp 767ndash774 Edinburgh UK July 2012

[18] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquo inProceedings of the 27th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 June 2014

[19] Z Zhu P Luo X Wang and X Tang ldquoDeep learning identity-preserving face spacerdquo in Proceedings of the 14th IEEE Interna-tional Conference on Computer Vision (ICCV rsquo13) pp 113ndash120Sydney Australia December 2013

[20] M Hayat M Bennamoun and S An ldquoDeep reconstructionmodels for image set classificationrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 37 no 4 pp 713ndash727 2015

[21] Y Sun X Wang and X Tang ldquoDeep learning face representa-tion from predicting 10000 classesrdquo in Proceedings of the 27thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo14) pp 1891ndash1898 Columbus Ohio USA June 2014

[22] Y Sun X Wang and X Tang ldquoDeep learning face rep-resentation by joint identification-verificationrdquo Tech Rephttpsarxivorgabs14064773

[23] X Cai CWang B Xiao X Chen and J Zhou ldquoDeep nonlinearmetric learning with independent subspace analysis for faceverificationrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 749ndash752 November2012

[24] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[25] Q V Le J Ngiam A Coates A Lahiri B Prochnow andA Y Ng ldquoOn optimization methods for deep learningrdquo inProceedings of the 28th International Conference on MachineLearning (ICML rsquo11) pp 265ndash272 Bellevue Wash USA July2011

[26] C Zhou X Wei Q Zhang and X Fang ldquoFisherrsquos lineardiscriminant (FLD) and support vectormachine (SVM) in non-negative matrix factorization (NMF) residual space for facerecognitionrdquoOptica Applicata vol 40 no 3 pp 693ndash704 2010

[27] A Martinez and R Benavente ldquoThe AR face databaserdquo CVCTech Rep 24 1998

[28] A S Georghiades P N Belhumeur and D J Kriegman ldquoFromfew to many illumination cone models for face recognitionunder variable lighting and poserdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 23 no 6 pp 643ndash6602001

[29] N Kumar A C Berg P N Belhumeur and S K NayarldquoAttribute and simile classifiers for face verificationrdquo in Proceed-ings of the 12th International Conference on Computer Vision(ICCV rsquo09) pp 365ndash372 Kyoto Japan October 2009

[30] M Rezaei and R Klette ldquoNovel adaptive eye detection andtracking for challenging lighting conditionsrdquo in ComputerVisionmdashACCV 2012 Workshops J-I Park and J Kim Edsvol 7729 of Lecture Notes in Computer Science pp 427ndash440Springer Berlin Germany 2013

[31] Y Tang R Salakhutdinov andGHHinton ldquoDeep Lambertiannetworksrdquo inProceedings of the 29th International Conference onMachine Learning (ICML 2012) pp 1623ndash1630 Edinburgh UKJuly 2012

[32] Q V Le A Karpenko J Ngiam and A Y Ng ldquoICA withreconstruction cost for efficient overcomplete feature learningrdquoin Proceedings of the 25th Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo11) pp 1017ndash1025 GranadaSpain December 2011

14 Mathematical Problems in Engineering

[33] X Glorot A Bordes and Y Bengio ldquoDeep sparse rectifier neu-ral networksrdquo in Proceedings of the 14th International Conferenceon Artificial Intelligence and Statistics pp 315ndash323 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article Adaptive Deep Supervised Autoencoder ...Adaptive Deep Supervised Network Template (ADSNT). the deep network perform well, similar to [], we need to giveitinitializationweights.en,thepreinitializedADSNT

10 Mathematical Problems in Engineering

True

pos

itive

rate

04

05

06

07

08

09

10

02 03 04 05 06 07 08 09 1001

False positive rate

DAEMDAECAEDLN

SSAERICATDRMOur method

(a)

01 02 03 04 05 06 07 08 09 1000

False positive rate

07

08

09

10

True

pos

itive

rate

DAEMDAECAESSAE

RICATDRMOur method

(b)

050

055

060

065

070

075

080

085

090

095

100

True

pos

itive

rate

02 03 04 05 06 07 08 09 1001

False positive rate

DAEMDAECAEDLN

SSAERICATDRMOur method

(c)

Figure 7 Comparisons of ROC curves between our method and other methods on different databases (a) AR (b) Extended Yale B and (c)PubFig

and outperforms all compared methods The reason is thatourmethod can extract discriminative robust information tovariances (expression illumination pose etc) in the learneddeep networks Compared with a supervised method likeRICA the proposed method can improve over 16 19and 23 on AR PubFig and Extended Yale B databasesrespectively Our method is a deep learning method whichfocuses on the nonlinear classification problem with learninga nonlinear mapping such that more nonlinear discriminantinformation may be explored to enhance the identificationperformance Compared with SSAE method that is designedfor removing the variances such as illumination pose andpartial occlusion our method can still be better over 6

because of using the weight penalty terms GRBM to initializeweights and three layersrsquo similarity preservation term

432 Convergence Analysis In this subsection we evaluatedthe convergence of our ADSNT versus a different numberof iterations Figure 8 illustrates the value of the objectivefunction of ADSNT versus a different number of iterationson the AR PubFig and Extended Yale B databases FromFigure 8(a) one can observe that ADSNT converges in about55 28 and 70 iterations on the three databases respectively

We also implement the identification accuracy of ADSNTversus a different number of iterations on the AR PubFigand Extended Yale B databases Figure 8(b) plots the mean

Mathematical Problems in Engineering 11

Obj

ectiv

e fun

ctio

n va

lue

4

3

2

1

0

Iteration number0 30 60 90 120 150

ARExtended Yale BPubFig

(a)

ARExtended Yale BPubFig

Mea

n id

entifi

catio

n ra

te (

)

100

80

60

40

Iteration number20 40 60 80

(b)

Figure 8 Convergence analysis (a) Convergence curves of ADSNT on AR PubFig and Extended Yale B (b) Mean identification rate ()versus iterations of ADSNT on AR PubFig and Extended Yale B

Iden

tifica

tion

accu

racy

()

AR Extended Yale B PubFig0

10

20

30

40

50

60

70

80

90

100

Layer 1

Layer 2

Layer 3

Layer 4

Figure 9 The results of ADSNT with different network depth onthe different datasets

identification rate of ADSNT From Figure 8(b) one can alsoobserve that ADSNT achieves stable performance after about55 70 and 28 iterations on AR PubFig and Extended Yale Bdatabases respectively

433 The Effect of Network Depth In this subsection weconduct experiments on the three face datasets with differenthidden layer of our proposedADSNTnetworkThe proposedmethod achieves an identification rate of 923 plusmn 06 933 plusmn12 and 9122 plusmn 08 by three-hidden layer ADSNTnetwork that is 1024 rarr 500 rarr 120 respectively on ARExtended Yale B and PubFig datasets Figure 9 illustrates the

performance of different layer ADSNT One can observe thatthree-hidden layer network outperforms 2-layer networkand the result of 3-layer ADSNT network is very nearly equalto those of 4-layer network on AR and Extended Yale Bdatabases We also observe that the performance of 4-layernetwork is a bit lower than that of 3-layer network on thePubFig database In addition the deeper ADSNT networkis the more complex its computational complexity becomesTherefore the 3-layer network depth is a good trade-offbetween performance and computational complexity

434 Activation Function Following the work in [9] we alsoestimate the performance of ADSNTwith different activationfunctions such as sigmoid hyperbolic tangent and rectifiedlinear unit (ReLU) [33] which is defined as 119891(119909) = max(0 119909)When the sigmoid 119891(119909) = 1(1 + 119890minus119909) is used as activationfunction the objective function (see (6)) is rewritten asfollows

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

KL (120588119909 || 1205880) + 5sum119894

KL (120588 || 1205880))

(9)

where 120588119909 = (1119872)sum119894 119891(119909119894) 120588 = (1119872)sum119894 119891(119909119894)

12 Mathematical Problems in Engineering

Table 2 Comparisons of the ADSNT algorithm with differentactivation functions on the AR PubFig and Extended Yale Bdatabases

Dataset Sigmoid Tanh ReLUAR 8866 plusmn 14 9232 plusmn 07 9322 plusmn 15Extended Yale B 9055 plusmn 06 9366 plusmn 04 9454 plusmn 03PubFig 8740 plusmn 12 9126 plusmn 16 9244 plusmn 11

If ReLU is adopted as activation function (6) is formu-lated as

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171 + 5sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171)

(10)

Table 2 shows the performance of the proposed ADSNTbased on different activation functions conducted on thethree databases FromTable 2 one can see that ReLU achievesthe best performanceThe key reason is that we use theweightdecay term 120593 to optimize the objective function

435 Timing Consumption Analysis In this subsection weuse a HP Z620 workstation with Intel Xeon E5-2609 24GHzCPU 8G RAM and conduct a series of experiments onAR database to compare the time consumption of differentmethods which are tabulated in Table 3 The training time(seconds) is shown in Table 3(a) while the time (seconds)needed to recognize a face from the testing set is shownin Table 3(b) From Table 3 one can see that the proposedmethod requires comparatively more time for trainingbecause of initialization of ADSNT and performing imagereconstruction However the procedure of training is offlineWhen we identity an image from testing set our methodrequires less time than other methods

5 Conclusions

In this article we present an adaptive deep supervisedautoencoder based image reconstruction method for facerecognition Unlike conventional deep autoencoder basedface recognition method our method considers the classlabel information from training samples in the deep learn-ing procedure and can automatically discover the under-lying nonlinear manifold structures Specifically a multi-layer supervised adaptive network structure is presentedwhich is trained to extract characteristic features from cor-ruptedclean facial images and reconstruct the correspondingsimilar facial images The reconstruction is realized by a so-called ldquobottleneckrdquo neural network that learns to map face

Table 3 (a) Training time (seconds) for different methods (b)Testing time (seconds) for different methodsThe proposed methodcosts the least amount of testing time comparing with othermethods

(a)

Methods TimeDAE [12] 861MDAE [17] 1054CAE [15] 786DLN [31] 2343SSAE [9] 5351RICA [32] 1344TDRM [20] 1102Our method 12232

(b)

Methods TimeDAE [12] 027MDAE [17] 03CAE [15] 026DLN [31] 035SSAE [9] 022RICA [32] 019TDRM [20] 018Our method 013

images into a low-dimensional vector and to reconstructthe respective corresponding face images from the mappingvectors Having trained the ADSNT a new face image canthen be recognized by comparing its reconstruction imagewith individual gallery images during testing The proposedmethod has been evaluated on the widely used AR PubFigand Extended Yale B databases and the experimental resultshave shown its effectiveness For future work we are focusingon applying our proposed method to other application fieldssuch as pattern classification based on image set and actionrecognition based on the video to further demonstrate itsvalidity

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is partially supported by the research grant for theNatural Science Foundation from Sichuan Provincial Depart-ment of Education (Grant no 13ZB0336) and the NationalNatural Science Foundation of China (Grant no 61502059)

References

[1] J-T Chien and C-CWu ldquoDiscriminant waveletfaces and near-est feature classifiers for face recognitionrdquo IEEE Transactions on

Mathematical Problems in Engineering 13

Pattern Analysis and Machine Intelligence vol 24 no 12 pp1644ndash1649 2002

[2] Z Lei M Pietikainen and S Z Li ldquoLearning discriminant facedescriptorrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 36 no 2 pp 289ndash302 2014

[3] B Moghaddam T Jebara and A Pentland ldquoBayesian facerecognitionrdquo Pattern Recognition vol 33 no 11 pp 1771ndash17822000

[4] N Cristianini and J S Taylor An Introduction to SupportVector Machines and Other Kernel-based Learning MethodsCambridge University Press New York NY USA 2004

[5] R O Duda P E Hart and D G Stork Pattern ClassificationJohn Wiley amp Sons New York NY USA 2nd edition 2001

[6] W Zhao R Chellappa P J Phillips and A Rosenfeld ldquoFacerecognition a literature surveyrdquo ACM Computing Surveys vol35 no 4 pp 399ndash458 2003

[7] B Zhang S Shan X Chen and W Gao ldquoHistogram of Gaborphase patterns (HGPP) a novel object representation approachfor face recognitionrdquo IEEE Transactions on Image Processingvol 16 no 1 pp 57ndash68 2007

[8] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inNeural Information Processing Systems pp 1527ndash1554 2012

[9] S Gao Y Zhang K Jia J Lu and Y Zhang ldquoSingle sample facerecognition via learning deep supervised autoencodersrdquo IEEETransactions on Information Forensics and Security vol 10 no10 pp 2108ndash2118 2015

[10] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo American Associationfor the Advancement of Science Science vol 313 no 5786 pp504ndash507 2006

[11] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesmrdquo in Neural Networks Tricks ofthe Trade pp 437ndash478 Springer Berlin Germany 2012

[12] P Vincent H Larochelle I Lajoie Y Bengio and P-AManzagol ldquoStacked denoising autoencoders learning usefulrepresentations in a deep network with a local denoisingcriterionrdquo Journal of Machine Learning Research (JMLR) vol 11no 5 pp 3371ndash3408 2010

[13] V Nair and G E Hinton ldquoRectified linear units improveRestricted Boltzmann machinesrdquo in Proceedings of the 27thInternational Conference on Machine Learning (ICML rsquo10) pp807ndash814 Haifa Israel June 2010

[14] A Coates H Lee and A Y Ng ldquoAn analysis of single-layernetworks in unsupervised feature learningrdquo in Proceedings ofthe 14th International Conference on Artificial Intelligence andStatistics (AISTATS rsquo11) pp 215ndash223 Sardinia Italy 2010

[15] S Rifai P Vincent X Muller X Glorot and Y BengioldquoContractive auto-encoders explicit invariance during featureextractionrdquo in Proceedings of the 28th International Conferenceon Machine Learning (ICML rsquo11) pp 833ndash840 Bellevue WashUSA July 2011

[16] K Simonyan A Vedaldi and A Zisserman ldquoDeep fishernetworks for large-scale image classificationrdquo in Proceedings ofthe 27th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo13) pp 163ndash171 Lake Tahoe Nev USA Decem-ber 2013

[17] M Chen Z Xu K Q Weinberger and F Sha ldquoMarginalizeddenoising autoencoders for domain adaptationrdquo in Proceedingsof the 29th International Conference onMachine Learning (ICMLrsquo12) pp 767ndash774 Edinburgh UK July 2012

[18] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquo inProceedings of the 27th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 June 2014

[19] Z Zhu P Luo X Wang and X Tang ldquoDeep learning identity-preserving face spacerdquo in Proceedings of the 14th IEEE Interna-tional Conference on Computer Vision (ICCV rsquo13) pp 113ndash120Sydney Australia December 2013

[20] M Hayat M Bennamoun and S An ldquoDeep reconstructionmodels for image set classificationrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 37 no 4 pp 713ndash727 2015

[21] Y Sun X Wang and X Tang ldquoDeep learning face representa-tion from predicting 10000 classesrdquo in Proceedings of the 27thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo14) pp 1891ndash1898 Columbus Ohio USA June 2014

[22] Y Sun X Wang and X Tang ldquoDeep learning face rep-resentation by joint identification-verificationrdquo Tech Rephttpsarxivorgabs14064773

[23] X Cai CWang B Xiao X Chen and J Zhou ldquoDeep nonlinearmetric learning with independent subspace analysis for faceverificationrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 749ndash752 November2012

[24] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[25] Q V Le J Ngiam A Coates A Lahiri B Prochnow andA Y Ng ldquoOn optimization methods for deep learningrdquo inProceedings of the 28th International Conference on MachineLearning (ICML rsquo11) pp 265ndash272 Bellevue Wash USA July2011

[26] C Zhou X Wei Q Zhang and X Fang ldquoFisherrsquos lineardiscriminant (FLD) and support vectormachine (SVM) in non-negative matrix factorization (NMF) residual space for facerecognitionrdquoOptica Applicata vol 40 no 3 pp 693ndash704 2010

[27] A Martinez and R Benavente ldquoThe AR face databaserdquo CVCTech Rep 24 1998

[28] A S Georghiades P N Belhumeur and D J Kriegman ldquoFromfew to many illumination cone models for face recognitionunder variable lighting and poserdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 23 no 6 pp 643ndash6602001

[29] N Kumar A C Berg P N Belhumeur and S K NayarldquoAttribute and simile classifiers for face verificationrdquo in Proceed-ings of the 12th International Conference on Computer Vision(ICCV rsquo09) pp 365ndash372 Kyoto Japan October 2009

[30] M Rezaei and R Klette ldquoNovel adaptive eye detection andtracking for challenging lighting conditionsrdquo in ComputerVisionmdashACCV 2012 Workshops J-I Park and J Kim Edsvol 7729 of Lecture Notes in Computer Science pp 427ndash440Springer Berlin Germany 2013

[31] Y Tang R Salakhutdinov andGHHinton ldquoDeep Lambertiannetworksrdquo inProceedings of the 29th International Conference onMachine Learning (ICML 2012) pp 1623ndash1630 Edinburgh UKJuly 2012

[32] Q V Le A Karpenko J Ngiam and A Y Ng ldquoICA withreconstruction cost for efficient overcomplete feature learningrdquoin Proceedings of the 25th Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo11) pp 1017ndash1025 GranadaSpain December 2011

14 Mathematical Problems in Engineering

[33] X Glorot A Bordes and Y Bengio ldquoDeep sparse rectifier neu-ral networksrdquo in Proceedings of the 14th International Conferenceon Artificial Intelligence and Statistics pp 315ndash323 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Research Article Adaptive Deep Supervised Autoencoder ...Adaptive Deep Supervised Network Template (ADSNT). the deep network perform well, similar to [], we need to giveitinitializationweights.en,thepreinitializedADSNT

Mathematical Problems in Engineering 11

Obj

ectiv

e fun

ctio

n va

lue

4

3

2

1

0

Iteration number0 30 60 90 120 150

ARExtended Yale BPubFig

(a)

ARExtended Yale BPubFig

Mea

n id

entifi

catio

n ra

te (

)

100

80

60

40

Iteration number20 40 60 80

(b)

Figure 8 Convergence analysis (a) Convergence curves of ADSNT on AR PubFig and Extended Yale B (b) Mean identification rate ()versus iterations of ADSNT on AR PubFig and Extended Yale B

Iden

tifica

tion

accu

racy

()

AR Extended Yale B PubFig0

10

20

30

40

50

60

70

80

90

100

Layer 1

Layer 2

Layer 3

Layer 4

Figure 9 The results of ADSNT with different network depth onthe different datasets

identification rate of ADSNT From Figure 8(b) one can alsoobserve that ADSNT achieves stable performance after about55 70 and 28 iterations on AR PubFig and Extended Yale Bdatabases respectively

433 The Effect of Network Depth In this subsection weconduct experiments on the three face datasets with differenthidden layer of our proposedADSNTnetworkThe proposedmethod achieves an identification rate of 923 plusmn 06 933 plusmn12 and 9122 plusmn 08 by three-hidden layer ADSNTnetwork that is 1024 rarr 500 rarr 120 respectively on ARExtended Yale B and PubFig datasets Figure 9 illustrates the

performance of different layer ADSNT One can observe thatthree-hidden layer network outperforms 2-layer networkand the result of 3-layer ADSNT network is very nearly equalto those of 4-layer network on AR and Extended Yale Bdatabases We also observe that the performance of 4-layernetwork is a bit lower than that of 3-layer network on thePubFig database In addition the deeper ADSNT networkis the more complex its computational complexity becomesTherefore the 3-layer network depth is a good trade-offbetween performance and computational complexity

434 Activation Function Following the work in [9] we alsoestimate the performance of ADSNTwith different activationfunctions such as sigmoid hyperbolic tangent and rectifiedlinear unit (ReLU) [33] which is defined as 119891(119909) = max(0 119909)When the sigmoid 119891(119909) = 1(1 + 119890minus119909) is used as activationfunction the objective function (see (6)) is rewritten asfollows

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

KL (120588119909 || 1205880) + 5sum119894

KL (120588 || 1205880))

(9)

where 120588119909 = (1119872)sum119894 119891(119909119894) 120588 = (1119872)sum119894 119891(119909119894)

12 Mathematical Problems in Engineering

Table 2 Comparisons of the ADSNT algorithm with differentactivation functions on the AR PubFig and Extended Yale Bdatabases

Dataset Sigmoid Tanh ReLUAR 8866 plusmn 14 9232 plusmn 07 9322 plusmn 15Extended Yale B 9055 plusmn 06 9366 plusmn 04 9454 plusmn 03PubFig 8740 plusmn 12 9126 plusmn 16 9244 plusmn 11

If ReLU is adopted as activation function (6) is formu-lated as

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171 + 5sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171)

(10)

Table 2 shows the performance of the proposed ADSNTbased on different activation functions conducted on thethree databases FromTable 2 one can see that ReLU achievesthe best performanceThe key reason is that we use theweightdecay term 120593 to optimize the objective function

435 Timing Consumption Analysis In this subsection weuse a HP Z620 workstation with Intel Xeon E5-2609 24GHzCPU 8G RAM and conduct a series of experiments onAR database to compare the time consumption of differentmethods which are tabulated in Table 3 The training time(seconds) is shown in Table 3(a) while the time (seconds)needed to recognize a face from the testing set is shownin Table 3(b) From Table 3 one can see that the proposedmethod requires comparatively more time for trainingbecause of initialization of ADSNT and performing imagereconstruction However the procedure of training is offlineWhen we identity an image from testing set our methodrequires less time than other methods

5 Conclusions

In this article we present an adaptive deep supervisedautoencoder based image reconstruction method for facerecognition Unlike conventional deep autoencoder basedface recognition method our method considers the classlabel information from training samples in the deep learn-ing procedure and can automatically discover the under-lying nonlinear manifold structures Specifically a multi-layer supervised adaptive network structure is presentedwhich is trained to extract characteristic features from cor-ruptedclean facial images and reconstruct the correspondingsimilar facial images The reconstruction is realized by a so-called ldquobottleneckrdquo neural network that learns to map face

Table 3 (a) Training time (seconds) for different methods (b)Testing time (seconds) for different methodsThe proposed methodcosts the least amount of testing time comparing with othermethods

(a)

Methods TimeDAE [12] 861MDAE [17] 1054CAE [15] 786DLN [31] 2343SSAE [9] 5351RICA [32] 1344TDRM [20] 1102Our method 12232

(b)

Methods TimeDAE [12] 027MDAE [17] 03CAE [15] 026DLN [31] 035SSAE [9] 022RICA [32] 019TDRM [20] 018Our method 013

images into a low-dimensional vector and to reconstructthe respective corresponding face images from the mappingvectors Having trained the ADSNT a new face image canthen be recognized by comparing its reconstruction imagewith individual gallery images during testing The proposedmethod has been evaluated on the widely used AR PubFigand Extended Yale B databases and the experimental resultshave shown its effectiveness For future work we are focusingon applying our proposed method to other application fieldssuch as pattern classification based on image set and actionrecognition based on the video to further demonstrate itsvalidity

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is partially supported by the research grant for theNatural Science Foundation from Sichuan Provincial Depart-ment of Education (Grant no 13ZB0336) and the NationalNatural Science Foundation of China (Grant no 61502059)

References

[1] J-T Chien and C-CWu ldquoDiscriminant waveletfaces and near-est feature classifiers for face recognitionrdquo IEEE Transactions on

Mathematical Problems in Engineering 13

Pattern Analysis and Machine Intelligence vol 24 no 12 pp1644ndash1649 2002

[2] Z Lei M Pietikainen and S Z Li ldquoLearning discriminant facedescriptorrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 36 no 2 pp 289ndash302 2014

[3] B Moghaddam T Jebara and A Pentland ldquoBayesian facerecognitionrdquo Pattern Recognition vol 33 no 11 pp 1771ndash17822000

[4] N Cristianini and J S Taylor An Introduction to SupportVector Machines and Other Kernel-based Learning MethodsCambridge University Press New York NY USA 2004

[5] R O Duda P E Hart and D G Stork Pattern ClassificationJohn Wiley amp Sons New York NY USA 2nd edition 2001

[6] W Zhao R Chellappa P J Phillips and A Rosenfeld ldquoFacerecognition a literature surveyrdquo ACM Computing Surveys vol35 no 4 pp 399ndash458 2003

[7] B Zhang S Shan X Chen and W Gao ldquoHistogram of Gaborphase patterns (HGPP) a novel object representation approachfor face recognitionrdquo IEEE Transactions on Image Processingvol 16 no 1 pp 57ndash68 2007

[8] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inNeural Information Processing Systems pp 1527ndash1554 2012

[9] S Gao Y Zhang K Jia J Lu and Y Zhang ldquoSingle sample facerecognition via learning deep supervised autoencodersrdquo IEEETransactions on Information Forensics and Security vol 10 no10 pp 2108ndash2118 2015

[10] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo American Associationfor the Advancement of Science Science vol 313 no 5786 pp504ndash507 2006

[11] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesmrdquo in Neural Networks Tricks ofthe Trade pp 437ndash478 Springer Berlin Germany 2012

[12] P Vincent H Larochelle I Lajoie Y Bengio and P-AManzagol ldquoStacked denoising autoencoders learning usefulrepresentations in a deep network with a local denoisingcriterionrdquo Journal of Machine Learning Research (JMLR) vol 11no 5 pp 3371ndash3408 2010

[13] V Nair and G E Hinton ldquoRectified linear units improveRestricted Boltzmann machinesrdquo in Proceedings of the 27thInternational Conference on Machine Learning (ICML rsquo10) pp807ndash814 Haifa Israel June 2010

[14] A Coates H Lee and A Y Ng ldquoAn analysis of single-layernetworks in unsupervised feature learningrdquo in Proceedings ofthe 14th International Conference on Artificial Intelligence andStatistics (AISTATS rsquo11) pp 215ndash223 Sardinia Italy 2010

[15] S Rifai P Vincent X Muller X Glorot and Y BengioldquoContractive auto-encoders explicit invariance during featureextractionrdquo in Proceedings of the 28th International Conferenceon Machine Learning (ICML rsquo11) pp 833ndash840 Bellevue WashUSA July 2011

[16] K Simonyan A Vedaldi and A Zisserman ldquoDeep fishernetworks for large-scale image classificationrdquo in Proceedings ofthe 27th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo13) pp 163ndash171 Lake Tahoe Nev USA Decem-ber 2013

[17] M Chen Z Xu K Q Weinberger and F Sha ldquoMarginalizeddenoising autoencoders for domain adaptationrdquo in Proceedingsof the 29th International Conference onMachine Learning (ICMLrsquo12) pp 767ndash774 Edinburgh UK July 2012

[18] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquo inProceedings of the 27th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 June 2014

[19] Z Zhu P Luo X Wang and X Tang ldquoDeep learning identity-preserving face spacerdquo in Proceedings of the 14th IEEE Interna-tional Conference on Computer Vision (ICCV rsquo13) pp 113ndash120Sydney Australia December 2013

[20] M Hayat M Bennamoun and S An ldquoDeep reconstructionmodels for image set classificationrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 37 no 4 pp 713ndash727 2015

[21] Y Sun X Wang and X Tang ldquoDeep learning face representa-tion from predicting 10000 classesrdquo in Proceedings of the 27thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo14) pp 1891ndash1898 Columbus Ohio USA June 2014

[22] Y Sun X Wang and X Tang ldquoDeep learning face rep-resentation by joint identification-verificationrdquo Tech Rephttpsarxivorgabs14064773

[23] X Cai CWang B Xiao X Chen and J Zhou ldquoDeep nonlinearmetric learning with independent subspace analysis for faceverificationrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 749ndash752 November2012

[24] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[25] Q V Le J Ngiam A Coates A Lahiri B Prochnow andA Y Ng ldquoOn optimization methods for deep learningrdquo inProceedings of the 28th International Conference on MachineLearning (ICML rsquo11) pp 265ndash272 Bellevue Wash USA July2011

[26] C Zhou X Wei Q Zhang and X Fang ldquoFisherrsquos lineardiscriminant (FLD) and support vectormachine (SVM) in non-negative matrix factorization (NMF) residual space for facerecognitionrdquoOptica Applicata vol 40 no 3 pp 693ndash704 2010

[27] A Martinez and R Benavente ldquoThe AR face databaserdquo CVCTech Rep 24 1998

[28] A S Georghiades P N Belhumeur and D J Kriegman ldquoFromfew to many illumination cone models for face recognitionunder variable lighting and poserdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 23 no 6 pp 643ndash6602001

[29] N Kumar A C Berg P N Belhumeur and S K NayarldquoAttribute and simile classifiers for face verificationrdquo in Proceed-ings of the 12th International Conference on Computer Vision(ICCV rsquo09) pp 365ndash372 Kyoto Japan October 2009

[30] M Rezaei and R Klette ldquoNovel adaptive eye detection andtracking for challenging lighting conditionsrdquo in ComputerVisionmdashACCV 2012 Workshops J-I Park and J Kim Edsvol 7729 of Lecture Notes in Computer Science pp 427ndash440Springer Berlin Germany 2013

[31] Y Tang R Salakhutdinov andGHHinton ldquoDeep Lambertiannetworksrdquo inProceedings of the 29th International Conference onMachine Learning (ICML 2012) pp 1623ndash1630 Edinburgh UKJuly 2012

[32] Q V Le A Karpenko J Ngiam and A Y Ng ldquoICA withreconstruction cost for efficient overcomplete feature learningrdquoin Proceedings of the 25th Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo11) pp 1017ndash1025 GranadaSpain December 2011

14 Mathematical Problems in Engineering

[33] X Glorot A Bordes and Y Bengio ldquoDeep sparse rectifier neu-ral networksrdquo in Proceedings of the 14th International Conferenceon Artificial Intelligence and Statistics pp 315ndash323 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Research Article Adaptive Deep Supervised Autoencoder ...Adaptive Deep Supervised Network Template (ADSNT). the deep network perform well, similar to [], we need to giveitinitializationweights.en,thepreinitializedADSNT

12 Mathematical Problems in Engineering

Table 2 Comparisons of the ADSNT algorithm with differentactivation functions on the AR PubFig and Extended Yale Bdatabases

Dataset Sigmoid Tanh ReLUAR 8866 plusmn 14 9232 plusmn 07 9322 plusmn 15Extended Yale B 9055 plusmn 06 9366 plusmn 04 9454 plusmn 03PubFig 8740 plusmn 12 9126 plusmn 16 9244 plusmn 11

If ReLU is adopted as activation function (6) is formu-lated as

argmin120579ADSNT

119869= 1119872 sum

119894

1003817100381710038171003817119909119894 minus 11990911989410038171003817100381710038172 + 120582120579119882119872 sum119894

1003817100381710038171003817119891 (119909119894) minus 119891 (119909119894)10038171003817100381710038172

+ 1205932 ( 3sum119895

10038171003817100381710038171003817119882(119894)119890 100381710038171003817100381710038172119865 +3sum119895

10038171003817100381710038171003817119882(119894)119889 100381710038171003817100381710038172119865)

+ 120574( 3sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171 + 5sum119894

1003817100381710038171003817119891 (119909119894)10038171003817100381710038171)

(10)

Table 2 shows the performance of the proposed ADSNTbased on different activation functions conducted on thethree databases FromTable 2 one can see that ReLU achievesthe best performanceThe key reason is that we use theweightdecay term 120593 to optimize the objective function

435 Timing Consumption Analysis In this subsection weuse a HP Z620 workstation with Intel Xeon E5-2609 24GHzCPU 8G RAM and conduct a series of experiments onAR database to compare the time consumption of differentmethods which are tabulated in Table 3 The training time(seconds) is shown in Table 3(a) while the time (seconds)needed to recognize a face from the testing set is shownin Table 3(b) From Table 3 one can see that the proposedmethod requires comparatively more time for trainingbecause of initialization of ADSNT and performing imagereconstruction However the procedure of training is offlineWhen we identity an image from testing set our methodrequires less time than other methods

5 Conclusions

In this article we present an adaptive deep supervisedautoencoder based image reconstruction method for facerecognition Unlike conventional deep autoencoder basedface recognition method our method considers the classlabel information from training samples in the deep learn-ing procedure and can automatically discover the under-lying nonlinear manifold structures Specifically a multi-layer supervised adaptive network structure is presentedwhich is trained to extract characteristic features from cor-ruptedclean facial images and reconstruct the correspondingsimilar facial images The reconstruction is realized by a so-called ldquobottleneckrdquo neural network that learns to map face

Table 3 (a) Training time (seconds) for different methods (b)Testing time (seconds) for different methodsThe proposed methodcosts the least amount of testing time comparing with othermethods

(a)

Methods TimeDAE [12] 861MDAE [17] 1054CAE [15] 786DLN [31] 2343SSAE [9] 5351RICA [32] 1344TDRM [20] 1102Our method 12232

(b)

Methods TimeDAE [12] 027MDAE [17] 03CAE [15] 026DLN [31] 035SSAE [9] 022RICA [32] 019TDRM [20] 018Our method 013

images into a low-dimensional vector and to reconstructthe respective corresponding face images from the mappingvectors Having trained the ADSNT a new face image canthen be recognized by comparing its reconstruction imagewith individual gallery images during testing The proposedmethod has been evaluated on the widely used AR PubFigand Extended Yale B databases and the experimental resultshave shown its effectiveness For future work we are focusingon applying our proposed method to other application fieldssuch as pattern classification based on image set and actionrecognition based on the video to further demonstrate itsvalidity

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This paper is partially supported by the research grant for theNatural Science Foundation from Sichuan Provincial Depart-ment of Education (Grant no 13ZB0336) and the NationalNatural Science Foundation of China (Grant no 61502059)

References

[1] J-T Chien and C-CWu ldquoDiscriminant waveletfaces and near-est feature classifiers for face recognitionrdquo IEEE Transactions on

Mathematical Problems in Engineering 13

Pattern Analysis and Machine Intelligence vol 24 no 12 pp1644ndash1649 2002

[2] Z Lei M Pietikainen and S Z Li ldquoLearning discriminant facedescriptorrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 36 no 2 pp 289ndash302 2014

[3] B Moghaddam T Jebara and A Pentland ldquoBayesian facerecognitionrdquo Pattern Recognition vol 33 no 11 pp 1771ndash17822000

[4] N Cristianini and J S Taylor An Introduction to SupportVector Machines and Other Kernel-based Learning MethodsCambridge University Press New York NY USA 2004

[5] R O Duda P E Hart and D G Stork Pattern ClassificationJohn Wiley amp Sons New York NY USA 2nd edition 2001

[6] W Zhao R Chellappa P J Phillips and A Rosenfeld ldquoFacerecognition a literature surveyrdquo ACM Computing Surveys vol35 no 4 pp 399ndash458 2003

[7] B Zhang S Shan X Chen and W Gao ldquoHistogram of Gaborphase patterns (HGPP) a novel object representation approachfor face recognitionrdquo IEEE Transactions on Image Processingvol 16 no 1 pp 57ndash68 2007

[8] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inNeural Information Processing Systems pp 1527ndash1554 2012

[9] S Gao Y Zhang K Jia J Lu and Y Zhang ldquoSingle sample facerecognition via learning deep supervised autoencodersrdquo IEEETransactions on Information Forensics and Security vol 10 no10 pp 2108ndash2118 2015

[10] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo American Associationfor the Advancement of Science Science vol 313 no 5786 pp504ndash507 2006

[11] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesmrdquo in Neural Networks Tricks ofthe Trade pp 437ndash478 Springer Berlin Germany 2012

[12] P Vincent H Larochelle I Lajoie Y Bengio and P-AManzagol ldquoStacked denoising autoencoders learning usefulrepresentations in a deep network with a local denoisingcriterionrdquo Journal of Machine Learning Research (JMLR) vol 11no 5 pp 3371ndash3408 2010

[13] V Nair and G E Hinton ldquoRectified linear units improveRestricted Boltzmann machinesrdquo in Proceedings of the 27thInternational Conference on Machine Learning (ICML rsquo10) pp807ndash814 Haifa Israel June 2010

[14] A Coates H Lee and A Y Ng ldquoAn analysis of single-layernetworks in unsupervised feature learningrdquo in Proceedings ofthe 14th International Conference on Artificial Intelligence andStatistics (AISTATS rsquo11) pp 215ndash223 Sardinia Italy 2010

[15] S Rifai P Vincent X Muller X Glorot and Y BengioldquoContractive auto-encoders explicit invariance during featureextractionrdquo in Proceedings of the 28th International Conferenceon Machine Learning (ICML rsquo11) pp 833ndash840 Bellevue WashUSA July 2011

[16] K Simonyan A Vedaldi and A Zisserman ldquoDeep fishernetworks for large-scale image classificationrdquo in Proceedings ofthe 27th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo13) pp 163ndash171 Lake Tahoe Nev USA Decem-ber 2013

[17] M Chen Z Xu K Q Weinberger and F Sha ldquoMarginalizeddenoising autoencoders for domain adaptationrdquo in Proceedingsof the 29th International Conference onMachine Learning (ICMLrsquo12) pp 767ndash774 Edinburgh UK July 2012

[18] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquo inProceedings of the 27th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 June 2014

[19] Z Zhu P Luo X Wang and X Tang ldquoDeep learning identity-preserving face spacerdquo in Proceedings of the 14th IEEE Interna-tional Conference on Computer Vision (ICCV rsquo13) pp 113ndash120Sydney Australia December 2013

[20] M Hayat M Bennamoun and S An ldquoDeep reconstructionmodels for image set classificationrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 37 no 4 pp 713ndash727 2015

[21] Y Sun X Wang and X Tang ldquoDeep learning face representa-tion from predicting 10000 classesrdquo in Proceedings of the 27thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo14) pp 1891ndash1898 Columbus Ohio USA June 2014

[22] Y Sun X Wang and X Tang ldquoDeep learning face rep-resentation by joint identification-verificationrdquo Tech Rephttpsarxivorgabs14064773

[23] X Cai CWang B Xiao X Chen and J Zhou ldquoDeep nonlinearmetric learning with independent subspace analysis for faceverificationrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 749ndash752 November2012

[24] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[25] Q V Le J Ngiam A Coates A Lahiri B Prochnow andA Y Ng ldquoOn optimization methods for deep learningrdquo inProceedings of the 28th International Conference on MachineLearning (ICML rsquo11) pp 265ndash272 Bellevue Wash USA July2011

[26] C Zhou X Wei Q Zhang and X Fang ldquoFisherrsquos lineardiscriminant (FLD) and support vectormachine (SVM) in non-negative matrix factorization (NMF) residual space for facerecognitionrdquoOptica Applicata vol 40 no 3 pp 693ndash704 2010

[27] A Martinez and R Benavente ldquoThe AR face databaserdquo CVCTech Rep 24 1998

[28] A S Georghiades P N Belhumeur and D J Kriegman ldquoFromfew to many illumination cone models for face recognitionunder variable lighting and poserdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 23 no 6 pp 643ndash6602001

[29] N Kumar A C Berg P N Belhumeur and S K NayarldquoAttribute and simile classifiers for face verificationrdquo in Proceed-ings of the 12th International Conference on Computer Vision(ICCV rsquo09) pp 365ndash372 Kyoto Japan October 2009

[30] M Rezaei and R Klette ldquoNovel adaptive eye detection andtracking for challenging lighting conditionsrdquo in ComputerVisionmdashACCV 2012 Workshops J-I Park and J Kim Edsvol 7729 of Lecture Notes in Computer Science pp 427ndash440Springer Berlin Germany 2013

[31] Y Tang R Salakhutdinov andGHHinton ldquoDeep Lambertiannetworksrdquo inProceedings of the 29th International Conference onMachine Learning (ICML 2012) pp 1623ndash1630 Edinburgh UKJuly 2012

[32] Q V Le A Karpenko J Ngiam and A Y Ng ldquoICA withreconstruction cost for efficient overcomplete feature learningrdquoin Proceedings of the 25th Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo11) pp 1017ndash1025 GranadaSpain December 2011

14 Mathematical Problems in Engineering

[33] X Glorot A Bordes and Y Bengio ldquoDeep sparse rectifier neu-ral networksrdquo in Proceedings of the 14th International Conferenceon Artificial Intelligence and Statistics pp 315ndash323 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 13: Research Article Adaptive Deep Supervised Autoencoder ...Adaptive Deep Supervised Network Template (ADSNT). the deep network perform well, similar to [], we need to giveitinitializationweights.en,thepreinitializedADSNT

Mathematical Problems in Engineering 13

Pattern Analysis and Machine Intelligence vol 24 no 12 pp1644ndash1649 2002

[2] Z Lei M Pietikainen and S Z Li ldquoLearning discriminant facedescriptorrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence vol 36 no 2 pp 289ndash302 2014

[3] B Moghaddam T Jebara and A Pentland ldquoBayesian facerecognitionrdquo Pattern Recognition vol 33 no 11 pp 1771ndash17822000

[4] N Cristianini and J S Taylor An Introduction to SupportVector Machines and Other Kernel-based Learning MethodsCambridge University Press New York NY USA 2004

[5] R O Duda P E Hart and D G Stork Pattern ClassificationJohn Wiley amp Sons New York NY USA 2nd edition 2001

[6] W Zhao R Chellappa P J Phillips and A Rosenfeld ldquoFacerecognition a literature surveyrdquo ACM Computing Surveys vol35 no 4 pp 399ndash458 2003

[7] B Zhang S Shan X Chen and W Gao ldquoHistogram of Gaborphase patterns (HGPP) a novel object representation approachfor face recognitionrdquo IEEE Transactions on Image Processingvol 16 no 1 pp 57ndash68 2007

[8] A Krizhevsky I Sutskever and G E Hinton ldquoImageNetclassification with deep convolutional neural networksrdquo inNeural Information Processing Systems pp 1527ndash1554 2012

[9] S Gao Y Zhang K Jia J Lu and Y Zhang ldquoSingle sample facerecognition via learning deep supervised autoencodersrdquo IEEETransactions on Information Forensics and Security vol 10 no10 pp 2108ndash2118 2015

[10] G E Hinton and R R Salakhutdinov ldquoReducing the dimen-sionality of data with neural networksrdquo American Associationfor the Advancement of Science Science vol 313 no 5786 pp504ndash507 2006

[11] Y Bengio ldquoPractical recommendations for gradient-basedtraining of deep architecturesmrdquo in Neural Networks Tricks ofthe Trade pp 437ndash478 Springer Berlin Germany 2012

[12] P Vincent H Larochelle I Lajoie Y Bengio and P-AManzagol ldquoStacked denoising autoencoders learning usefulrepresentations in a deep network with a local denoisingcriterionrdquo Journal of Machine Learning Research (JMLR) vol 11no 5 pp 3371ndash3408 2010

[13] V Nair and G E Hinton ldquoRectified linear units improveRestricted Boltzmann machinesrdquo in Proceedings of the 27thInternational Conference on Machine Learning (ICML rsquo10) pp807ndash814 Haifa Israel June 2010

[14] A Coates H Lee and A Y Ng ldquoAn analysis of single-layernetworks in unsupervised feature learningrdquo in Proceedings ofthe 14th International Conference on Artificial Intelligence andStatistics (AISTATS rsquo11) pp 215ndash223 Sardinia Italy 2010

[15] S Rifai P Vincent X Muller X Glorot and Y BengioldquoContractive auto-encoders explicit invariance during featureextractionrdquo in Proceedings of the 28th International Conferenceon Machine Learning (ICML rsquo11) pp 833ndash840 Bellevue WashUSA July 2011

[16] K Simonyan A Vedaldi and A Zisserman ldquoDeep fishernetworks for large-scale image classificationrdquo in Proceedings ofthe 27th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo13) pp 163ndash171 Lake Tahoe Nev USA Decem-ber 2013

[17] M Chen Z Xu K Q Weinberger and F Sha ldquoMarginalizeddenoising autoencoders for domain adaptationrdquo in Proceedingsof the 29th International Conference onMachine Learning (ICMLrsquo12) pp 767ndash774 Edinburgh UK July 2012

[18] Y TaigmanM YangM Ranzato and LWolf ldquoDeepFace clos-ing the gap to human-level performance in face verificationrdquo inProceedings of the 27th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo14) pp 1701ndash1708 June 2014

[19] Z Zhu P Luo X Wang and X Tang ldquoDeep learning identity-preserving face spacerdquo in Proceedings of the 14th IEEE Interna-tional Conference on Computer Vision (ICCV rsquo13) pp 113ndash120Sydney Australia December 2013

[20] M Hayat M Bennamoun and S An ldquoDeep reconstructionmodels for image set classificationrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 37 no 4 pp 713ndash727 2015

[21] Y Sun X Wang and X Tang ldquoDeep learning face representa-tion from predicting 10000 classesrdquo in Proceedings of the 27thIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo14) pp 1891ndash1898 Columbus Ohio USA June 2014

[22] Y Sun X Wang and X Tang ldquoDeep learning face rep-resentation by joint identification-verificationrdquo Tech Rephttpsarxivorgabs14064773

[23] X Cai CWang B Xiao X Chen and J Zhou ldquoDeep nonlinearmetric learning with independent subspace analysis for faceverificationrdquo in Proceedings of the 20th ACM InternationalConference on Multimedia (MM rsquo12) pp 749ndash752 November2012

[24] G E Hinton S Osindero and Y-W Teh ldquoA fast learningalgorithm for deep belief netsrdquoNeural Computation vol 18 no7 pp 1527ndash1554 2006

[25] Q V Le J Ngiam A Coates A Lahiri B Prochnow andA Y Ng ldquoOn optimization methods for deep learningrdquo inProceedings of the 28th International Conference on MachineLearning (ICML rsquo11) pp 265ndash272 Bellevue Wash USA July2011

[26] C Zhou X Wei Q Zhang and X Fang ldquoFisherrsquos lineardiscriminant (FLD) and support vectormachine (SVM) in non-negative matrix factorization (NMF) residual space for facerecognitionrdquoOptica Applicata vol 40 no 3 pp 693ndash704 2010

[27] A Martinez and R Benavente ldquoThe AR face databaserdquo CVCTech Rep 24 1998

[28] A S Georghiades P N Belhumeur and D J Kriegman ldquoFromfew to many illumination cone models for face recognitionunder variable lighting and poserdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 23 no 6 pp 643ndash6602001

[29] N Kumar A C Berg P N Belhumeur and S K NayarldquoAttribute and simile classifiers for face verificationrdquo in Proceed-ings of the 12th International Conference on Computer Vision(ICCV rsquo09) pp 365ndash372 Kyoto Japan October 2009

[30] M Rezaei and R Klette ldquoNovel adaptive eye detection andtracking for challenging lighting conditionsrdquo in ComputerVisionmdashACCV 2012 Workshops J-I Park and J Kim Edsvol 7729 of Lecture Notes in Computer Science pp 427ndash440Springer Berlin Germany 2013

[31] Y Tang R Salakhutdinov andGHHinton ldquoDeep Lambertiannetworksrdquo inProceedings of the 29th International Conference onMachine Learning (ICML 2012) pp 1623ndash1630 Edinburgh UKJuly 2012

[32] Q V Le A Karpenko J Ngiam and A Y Ng ldquoICA withreconstruction cost for efficient overcomplete feature learningrdquoin Proceedings of the 25th Annual Conference on Neural Infor-mation Processing Systems (NIPS rsquo11) pp 1017ndash1025 GranadaSpain December 2011

14 Mathematical Problems in Engineering

[33] X Glorot A Bordes and Y Bengio ldquoDeep sparse rectifier neu-ral networksrdquo in Proceedings of the 14th International Conferenceon Artificial Intelligence and Statistics pp 315ndash323 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 14: Research Article Adaptive Deep Supervised Autoencoder ...Adaptive Deep Supervised Network Template (ADSNT). the deep network perform well, similar to [], we need to giveitinitializationweights.en,thepreinitializedADSNT

14 Mathematical Problems in Engineering

[33] X Glorot A Bordes and Y Bengio ldquoDeep sparse rectifier neu-ral networksrdquo in Proceedings of the 14th International Conferenceon Artificial Intelligence and Statistics pp 315ndash323 2011

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 15: Research Article Adaptive Deep Supervised Autoencoder ...Adaptive Deep Supervised Network Template (ADSNT). the deep network perform well, similar to [], we need to giveitinitializationweights.en,thepreinitializedADSNT

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of