10
Image super-resolution by dictionary concatenation and sparse representation with approximate L 0 norm minimization q Jinzheng Lu a,b,c,, Qiheng Zhang a , Zhiyong Xu a , Zhenming Peng b a Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, China b School of Optoelectronic Information, University of Electronic Science and Technology of China, Chengdu 610054, China c School of Information and Engineering, Southwest University of Science and Technology, Mianyang 621010, China article info Article history: Available online 24 December 2011 abstract This paper proposes a different image super-resolution (SR) reconstruction scheme, based on the newly advanced results of sparse representation and the recently presented SR methods via this model. Firstly, we online learn a subsidiary dictionary with the degrada- tion estimation of the given low-resolution image, and concatenate it with main one offline learned from many natural images with high quality. This strategy can strengthen the expressive ability of dictionary atoms. Secondly, the conventional matching pursuit algo- rithms commonly use a fixed sparsity threshold for sparse decomposition of all image patches, which is not optimal and even introduces annoying artifacts. Alternatively, we employ the approximate L 0 norm minimization to decompose accurately the patch over its dictionary. Thus the coefficients of representation with variant number of nonzero items can exactly weight atoms for those complicated local structures of image. Experimental results show that the proposed method produces high-resolution images that are compet- itive or superior in quality to results generated by similar techniques. Ó 2011 Elsevier Ltd. All rights reserved. 1. Introduction Super-resolution (SR) image reconstruction [1] refers to a signal processing technology which produces high-resolution (HR) image based on given low-resolution (LR) image(s) of the same scene. Over the last several years, how to suppress im- age degradation (blur and noise, etc.) as an active research topic, while still is one of the most challenging tasks. This work focuses on the resolution improvement of single-frame image. Among the different SR algorithms, many methods have been proposed, which can be roughly classified into three types, including interpolation-based approach [2,3], reconstruction- based approach [1], and learning-based approach [4–6]. Image SR has been theoretically and experimentally proven to im- prove effectively the image quality. Actually, the SR technique has already been widely applied in several fields (for example, remote sensing, video surveillance, and medical imaging) [7]. In this work, our research is mainly based on the interpolation and learning approach, because the reconstruction-based approach usually needs accurate registration [8] between LR images and enough frame number of LR images [9]. Especially, as a promising method, the learning-based approach can realize image SR directly, but the conventional reconstruction- based approach commonly needs uncertain loop iterations, which limits its practical application. Mathematically, the SR task is an inverse problem that its feasible solution relies on the prior information about the data and the image observation mod- el [7]. Using the learning-based approach, the ill-posed SR problem can be split into two procedures, which consist of the 0045-7906/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.compeleceng.2011.11.026 q Reviews processed and approved for publication by Editor-in-Chief Dr. Manu Malek. Corresponding author at: The 5th Lab, Institute of Optics and Electronics, Chinese Academy of Sciences, P.O. Box 350, Shuangliu, Chengdu 610209, Sichuan Province, China. Tel.: +86 28 85101485/801; fax: +86 28 85100966. E-mail address: [email protected] (J. Lu). Computers and Electrical Engineering 38 (2012) 1336–1345 Contents lists available at SciVerse ScienceDirect Computers and Electrical Engineering journal homepage: www.elsevier.com/locate/compeleceng

Image super-resolution by dictionary concatenation and sparse representation with approximate L0 norm minimization

Embed Size (px)

Citation preview

Page 1: Image super-resolution by dictionary concatenation and sparse representation with approximate L0 norm minimization

Computers and Electrical Engineering 38 (2012) 1336–1345

Contents lists available at SciVerse ScienceDirect

Computers and Electrical Engineering

journal homepage: www.elsevier .com/ locate /compeleceng

Image super-resolution by dictionary concatenation and sparserepresentation with approximate L0 norm minimization q

Jinzheng Lu a,b,c,⇑, Qiheng Zhang a, Zhiyong Xu a, Zhenming Peng b

a Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, Chinab School of Optoelectronic Information, University of Electronic Science and Technology of China, Chengdu 610054, Chinac School of Information and Engineering, Southwest University of Science and Technology, Mianyang 621010, China

a r t i c l e i n f o a b s t r a c t

Article history:Available online 24 December 2011

0045-7906/$ - see front matter � 2011 Elsevier Ltddoi:10.1016/j.compeleceng.2011.11.026

q Reviews processed and approved for publication⇑ Corresponding author at: The 5th Lab, Institute

Sichuan Province, China. Tel.: +86 28 85101485/801E-mail address: [email protected] (J. Lu).

This paper proposes a different image super-resolution (SR) reconstruction scheme, basedon the newly advanced results of sparse representation and the recently presented SRmethods via this model. Firstly, we online learn a subsidiary dictionary with the degrada-tion estimation of the given low-resolution image, and concatenate it with main one offlinelearned from many natural images with high quality. This strategy can strengthen theexpressive ability of dictionary atoms. Secondly, the conventional matching pursuit algo-rithms commonly use a fixed sparsity threshold for sparse decomposition of all imagepatches, which is not optimal and even introduces annoying artifacts. Alternatively, weemploy the approximate L0 norm minimization to decompose accurately the patch overits dictionary. Thus the coefficients of representation with variant number of nonzero itemscan exactly weight atoms for those complicated local structures of image. Experimentalresults show that the proposed method produces high-resolution images that are compet-itive or superior in quality to results generated by similar techniques.

� 2011 Elsevier Ltd. All rights reserved.

1. Introduction

Super-resolution (SR) image reconstruction [1] refers to a signal processing technology which produces high-resolution(HR) image based on given low-resolution (LR) image(s) of the same scene. Over the last several years, how to suppress im-age degradation (blur and noise, etc.) as an active research topic, while still is one of the most challenging tasks. This workfocuses on the resolution improvement of single-frame image. Among the different SR algorithms, many methods have beenproposed, which can be roughly classified into three types, including interpolation-based approach [2,3], reconstruction-based approach [1], and learning-based approach [4–6]. Image SR has been theoretically and experimentally proven to im-prove effectively the image quality. Actually, the SR technique has already been widely applied in several fields (for example,remote sensing, video surveillance, and medical imaging) [7].

In this work, our research is mainly based on the interpolation and learning approach, because the reconstruction-basedapproach usually needs accurate registration [8] between LR images and enough frame number of LR images [9]. Especially,as a promising method, the learning-based approach can realize image SR directly, but the conventional reconstruction-based approach commonly needs uncertain loop iterations, which limits its practical application. Mathematically, the SR taskis an inverse problem that its feasible solution relies on the prior information about the data and the image observation mod-el [7]. Using the learning-based approach, the ill-posed SR problem can be split into two procedures, which consist of the

. All rights reserved.

by Editor-in-Chief Dr. Manu Malek.of Optics and Electronics, Chinese Academy of Sciences, P.O. Box 350, Shuangliu, Chengdu 610209,

; fax: +86 28 85100966.

Page 2: Image super-resolution by dictionary concatenation and sparse representation with approximate L0 norm minimization

J. Lu et al. / Computers and Electrical Engineering 38 (2012) 1336–1345 1337

reconstruction of low frequency (LF) of the SR image and the enhancement of high frequency (HF) information of the SRimage. The LF part is commonly obtained with bicubic interpolation, while the HF one can be achieved from a set of patchesor primitive structures constructed with many other high quality images. More concretely, high- and low-resolution patchesdatabase or dictionary are first constructed with a mass of HR images and their corresponding LR images pre-degraded. Thenwe seek the best projection of the given LR image on the LR dictionary, which is called image-analysis or image-decomposition.Last, the atoms of HR dictionary are weighted with linear combination using the map relation, which is called image-synthesisor image-composition.

As a pioneer of the learning-based image SR approach [4], analyzed the limits of the traditional SR methods and showedparticular ‘face hallucination’ image reconstruction. Freeman et al. [5] proposed an example-based learning technique thathandled universal images, which the projection of the LR onto the HR is learned with the MRF model. However, these meth-ods typically demand a huge amount of example patches, thus enormous memory and large computation load are unavoid-able. Chang et al. [6] applied locally linear embedding (LLE) from manifold learning to process the SR task. They used fixed KHR patches based on the learned LLE local geometry from the LR patches, producing favorable SR effects. Nevertheless, thealgorithm generated some artifacts along the edges of SR image. To avoid over- or under-fitting of [6], Yang et al. [10,11]proposed an approach based on sparse coding of image patch under over-complete dictionary learned with coupled pattern,and generated superior SR results. But the sparse representation based on L1 norm is too time-consuming. After their work,Roman et al. [12] made some remarkable improvement based on a new dictionary learning algorithm and a sparse represen-tation approach. However, using the universal dictionary, the approach may produce artifacts for certain SR images. Anotherissue to be noted is the low guarantee and the fixed sparsity threshold of the sparse representation algorithm, which is usedin patch decomposition under dictionary. The fixed sparsity of the sparse representation process usually does not match thecomplicated case of the specific image structures.

Thus, we probably cannot obtain a perfect SR reconstruction if we use either the universal dictionary for specific LR imageor the fixed sparsity level of sparse representation for different patches. The structure information of the given LR imageshould be considered to enrich the dictionary. In addition, to reduce the deviance of sparse representation with respect todictionary for various patches, the sparsity level should be changed in accordance with the structural property of each patch.Alternatively, the sparsity measurement of representation coefficients can be constrained with the descending velocity ofsome continuous decreasing-function.

To overcome the disadvantages of the aforementioned sparse representation-based SR processes, in this work we developa novel single-image SR reconstruction system based on dictionary concatenating model and sparse representation algo-rithm via approximate L0 norm minimization. The contribution of this framework is summarized as follows. First, in orderto enhance the commonality of the over-complete dictionary pair, nothing but downsampling degradation is considered dur-ing offline learning procedure. Meanwhile, to adapt the particular structure and specific degrading procedure of the given LRimage, online learning pattern is used to obtain a subsidiary dictionary pair based on the degradation estimation of the givenLR image. Then, we concatenate the universal dictionary with the specific one, as a more efficient dictionary for the currentimage. In image reconstruction stage, inspired by the advance of a sparse decomposition approach [13], we use continuousexponential function to approximate the sparsity defined with L0 norm minimization, which evades the sensitivity of thenonzero number in representing image patch. Moreover, using some commonly used optimization algorithms, the approx-imate L0 norm criterion can be realized with several iterations. Finally, the sparse representation results of LR image patchesunder its own dictionary linearly weight the HR dictionary atoms, forming the HF information of the SR image. For beingcompatible with their neighborhoods, the HF and LF parts are fused with several pixels overlap.

This study is organized as follows. Section 2 describes the principle of sparse representation, image observation model,and super-resolution process based on sparse representation. In Section 3, we detail the proposed framework from stepsof realization. Various experimental results in Section 4 show the efficacy of the proposed system. Finally conclusions aredrawn in Section 5.

2. Problem formulation

Sparse representation modeling has been successfully applied in various inverse problems of image reconstruction. Re-search on harmonic analysis suggests that an over-complete dictionary can adaptively and sparsely represent an image. Wefirst describe the principle of dictionary-based sparse representation. Then the image observation model in terms of imagingmechanism is presented. Last the sparse representation based SR problem is analyzed with the two observation models.

2.1. Dictionary-based sparse representation

Considering a signal vector x 2 Rn, its representing model in signal harmonic analysis can be defined as [14]:

x ¼ Da; ð1Þ

where D 2 Rn�m is a basis matrix, a is a representation vector which only includes few nonzero items. Research on imagestatistics suggests that the matrix called over-complete dictionary (n « m) [15] has various advantages in decomposing sig-nal. With the given dictionary, the process of choosing the specific atoms from dictionary columns is called sparse

Page 3: Image super-resolution by dictionary concatenation and sparse representation with approximate L0 norm minimization

1338 J. Lu et al. / Computers and Electrical Engineering 38 (2012) 1336–1345

representation. Moreover, if the dictionary is learned from natural signal examples, it would represent signals adaptively,which has more powerful ability of capturing internal singular structures of signal and has higher sparsity of the represen-tation coefficients.

2.2. Observation model

Without loss of generality, it is assumed that the imaging procedure for single-frame can be modeled as y = SBx, in whichx and y are the original HR and the observed LR image, respectively. S and B are the downsampling and blurring matrixes,respectively. For simplification, the imaging model is rewritten as

y ¼ Hx; ð2Þ

where H is SB, which stands for the transfer function of the system. The SR problem is to reconstruct x from y, in which theill-posed process may not have unique solver because of the under-determinedness of H. Therefore, the prior knowledgeabout the target HR image or the imaging process is crucial to guarantee the stableness and uniqueness of the SR result.

2.3. Super-resolution process

For HR image x and LR image y, according to Eq. (1) we have x = Dhah and y = Dlal, in which Dh and Dl are correspondingover-complete dictionaries. Thus, associating this with Eq. (2), we have one inference Dlal = HDhah. Although the capturedimage always contains some degradation factor, its low-dimension projection over representative basis set almost equalsto one of the original image over appropriate dictionary. Consequently, the relationship can be represented as al � ah.Accordingly, there exists al = ah = a if the relation between Dh and Dl is presented as:

Dl ¼ HDh: ð3Þ

The Eq. (3) shows that the LR dictionary can be constructed from HR one introduced with degradation.Thus, we first construct dictionary pair Dh and Dl with the estimation of the degradation. Then, the representation a of LR

image y is analytically obtained with y = Dla. Last, HR result x is synthetically achieved with x = Dha. Therefore, the criticalissues of the SR process based on over-complete sparse representation involve the construction of dictionary as well as thesolving of sparse decomposition.

3. Sparse representation based super-resolution (SR)

The over-complete dictionary usually contains abundant elementary image structures (atoms) learned from examples.However, the universal dictionary may be not optimal for the specific image, generating bias from the true state. The struc-ture information on the given image should be imported into dictionary. Moreover, with the established dictionary pair, thedifferent local regions in the whole image should have unequal number of nonzero items in the representation, which showsvariant sparsity level over the dictionary.

In this section, our super-resolution framework by dictionary concatenation and sparse representation with approximateL0 norm is presented. First, the over-complete dictionary learning procedure is introduced. Next, the concatenation of dic-tionaries learned with offline- and online-mode is shown. Then, the sparse representation via approximate L0 norm minimi-zation is described. Last, the process of SR reconstruction is set out.

3.1. Over-complete dictionary learning

For the universal and specific dictionary learning, their procedures are similar with several distinct steps. To reduce thetime cost of training task, we only learn the LR dictionary from the LR image patches. For HR one, however, with the acquiredsparse representation coefficients we numerically calculate it from the HR image patches. The details are described as fol-lows, which is sketched with Fig. 1.

(1) Training set preparation: Suppose that fXihg is the high quality image set, for online learning i = 1. With the estimation

of degradation for pre-processing image, LR images fYilg can be constructed by blurring or downscaling the HR ones. It

is noted that, in offline learning form, only the downsampling factor is considered for the purpose of commonality ofthe dictionary. However, for online process, using the specific degradation contained in the given LR image, we cantrain a subsidiary dictionary which is particular to the current SR process. Then, LR image fYi

lg are upsampled withdesired factor, forming fZi

hg which is missed the HF information.(2) Training data extraction: To enhance the expressiveness of dictionary atoms, the gradients or other features are trained

in examples. To be more precise, for HR training examples the differences between fXihg and fZi

hg are selected, definedas fei

hg ¼ fXihg � fZ

ihg, while for LR training ones the gradients of fYi

lg are used, defined as ff ilg ¼ fFmfYi

lgg. Fm is a gra-dient operator, which includes first and second order gradients in both row and column. Thus, ff i

lg include four same-sized images featured by gradients.

Page 4: Image super-resolution by dictionary concatenation and sparse representation with approximate L0 norm minimization

Fig. 1. Diagram of over-complete dictionary learning with offline pattern. The Dl dictionary is learned with method [12], Dh is numerically calculated.

J. Lu et al. / Computers and Electrical Engineering 38 (2012) 1336–1345 1339

(3) Training patches division: Next, the whole image is divided into same-sized patches with several pixels overlap. Patchesare concatenated at row trend, in which each patch is an independent vector. With fpk

hg ¼ Rfeihg, in which R is the

patch taking operator, the HR patches are cascaded. However, for the LR cases, every feature image is clipped to formone patches matrix. Thus, four matrixes are stacked in vertical direction, as the accumulated fpk

l g ¼ Rff ilg.

(4) Constructing dictionary pair: With LR examples Y ¼ fpkl g, the task of training can be represented as

½Dl;C� ¼ arg min jjY � DlCjj22 s:t: 8k; jjakjj0 6 T; ð4Þ

where C = {ak} is the sparse representation matrix, Dl is the resulting LR dictionary, T is the sparsity threshold. As a result, thelearning problem includes two unknowns. We here apply an outstanding scheme [12] to this task. Following this, usingequation Dh ¼ arg min jjP� DhCjj22, in which P ¼ fpk

hg is the HR examples matrix, the HR dictionary Dh can be obtainedby Dh ¼ PCTðCCTÞ�1.

3.2. Concatenation of dictionaries

To distinguish the latter online learning mode, denote Doffl and Doff

h as the offline learned dictionary pair. Since the offlineuniversal dictionary may produce errors for some patches in sparse decomposition process, the online specific dictionarylearned from pre-processing LR image should make useful complement to that. Similar to the offline learning procedure,we assume that the online learned dictionary pair is denoted as Don

l and Donh , with the estimation of possible degradation

existed in the LR image.Thus, using pretty parameters, the online specific dictionary pair is concatenated with the offline universal one, in the

form

Dl ¼ ½aDoffl bDon

l �; Dh ¼ ½aDoffh bDon

h � ð5Þ

where a and b control the tradeoffs between offline- and online-pattern. In all our settings, they simply are set to 1.

3.3. Sparse representation via approximate L0 norm (AL0)

With the given dictionary D, the sparse representation a of patch x can be presented as [15,16]:

a ¼ arg min jjx� Dajj22 s:t: jjajj0 6 T; ð6Þ

where ||�||0 denotes the nonzero number in data, which is defined with L0 norm. T is the sparsity measure of a. Among theexisting solvers, L1 norm based algorithms and greedy matching pursuit approaches have particular disadvantages uponcomplexity and guarantee, while the approach to direct minimizing L0 norm would favorably compromise these properties.For this reason, we employ a sort of smoothed decreasing-function to approximate the L0 norm minimization [13], whichproduced the state-of-the-art results among sparse decomposition operation.

Specifically, the variation trend of nonzero items sorted can be approximated with function:

frðaÞ ¼ expð�a2=2r2Þ; ð7Þ

where r controls the intensity of the curvature. Moreover, when r approaches to zero, the fr(a) is just Dirac delta function, inwhich there exists only two values for all variables. Thus, with definition:

FrðaÞ ¼Xm

i¼1

frðaiÞ; ð8Þ

Page 5: Image super-resolution by dictionary concatenation and sparse representation with approximate L0 norm minimization

1340 J. Lu et al. / Computers and Electrical Engineering 38 (2012) 1336–1345

Fr(a) is the number of zero items in a for small values of r, that is ||a||0 �m � Fr(a) as the sparsity measure, where m isthe length of vector a. As a result, the Eq. (6) can be represented as:

a ¼ arg min JðaÞ ¼ ðm� FrðaÞÞ þ kjjx� Dajj22n o

; ð9Þ

where k controls the relative contribution of prior item and data fidelity item. For a declining sequence of r = {rt}, a gradientdescent algorithm can optimize the minimization of J(a). Differentiating Eq. (9) on a, we have the gradient direction

DJðaÞ ¼ �2kðDTðx� DaÞÞ þ ðr2Þ�1½a1frða1Þ; . . . ;amfrðamÞ�T ; ð10Þ

Thus, the iterative steepest descent optimization can be written as an ¼ an � lDJðaÞ, where l is a constant for the stepsize, an is the nth iteration result. Finally, the back projection process further stabilizes the result with formulation ofanþ1 ¼ an þ Dyðx� DanÞ. The parameters concerned will be discussed in Section 4.

3.4. Super-resolution image reconstruction

This partition describes the procedure of SR based on sparse representation model. Using the concatenated over-completedictionary pair and the given LR image, the sparse decomposing of LR image patches is first carried out with approximate L0

(AL0) norm minimization approach. Then HR dictionary atoms are linearly weighted with these representation coefficients,as the HF component of the desired SR image. Finally, the LF component produced by bicubic interpolation and the HF oneare fused with pixels overlap.

(1) Suppose the given LR image Yl, construct feature patches fpkl g ¼ RfFmfYlgg with similarity to the above dictionary

learning process. Thus, Y ¼ fpkl g is built.

(2) Decompose Y over Dl, the sparse representation C = {ak} can be solved with approximate L0 norm minimization, asdefined C = AL0(Y,Dl).

(3) Synthesize the HF component P ¼ fpkhg using the linear combination P = DhC.

(4) Upsample the image Yl with Z = Qs(Yl), as LF component, in which Qs is the interpolation with a factor of s desired.(5) Fuse LF and HF component patches by X = RT(Z + P) with pixels overlap.

Thus, from the above analysis, the dictionary with abundant primitive structures and the sparse representation processwith characteristic of high guarantee and speed are the crucial aspects of the proposed framework. Next, we show the effec-tiveness of the approach through various SR reconstruction experiments.

4. Experimental results and discussion

To present the performance of the proposed framework, this section demonstrates two objective experiments under dif-ferent scale conditions for testing image and one subjective experiment for real image. Moreover, we compare the followingsimilar approaches, including bicubic interpolation with the Matlab imresize function, L1 norm-based SR (L1SR) [10,11], OMPsparse representation based SR (OMPSR) [12], our first type of SR with universal dictionary and approximate L0 norm (UD-AL0SR), and second type of one with concatenated dictionary (CD-AL0SR). The parameters in the compared approaches areset according to their references.

During the offline dictionary learning, we employ the same training images used in [10], in which the size of patch is set to5 � 5, with overlap of four pixels between adjoined patches. We set patch with complete overlap for better consistent conti-nuity. Bicubic operator is used to downsample the image. The trained content is similar to [12], which includes difference forHR image and gradient for LR image. The size of dictionary is set to 512 in offline mode, while it is set to 128 in online one. Infact, when the size of dictionary is continuously raised, the efficiency of reconstruction is not remarkably improved. The othersetting for online learning is almost same with that of offline one, except that the possible blurring condition in the given LRimage is also introduced into dictionary atoms. For AL0 algorithm, we empirically set r = [0.02,0.01,0.005,0.002,0.001],l = 2.5, k = 20, and N = 3 iterations, which their settings are partly discussed in the following Section 4.2.

To quantitatively assess the quality of the images reconstructed, besides classical Peak Signal-to-Noise-Ratio (PSNR):

PSNR ðdBÞ ¼ 10 log2552

1MN

Pi

PjðFði; jÞ � Gði; jÞÞ2

; ð11Þ

we adopt as well as structural similarity (SSIM) [17]:

SSIM ¼ ð2lFlG þ c1Þð2rFG þ c2Þðl2

F þ l2G þ c1Þðr2

F þ r2G þ c2Þ

; ð12Þ

where F and G are the original and resultant image, respectively, l is mean, r is variance or covariance. c1 and c2 are smallconstants which prevent from the denominator being zero. The bigger value of SSIM is (up to 1), the better quality of theimage is.

Page 6: Image super-resolution by dictionary concatenation and sparse representation with approximate L0 norm minimization

J. Lu et al. / Computers and Electrical Engineering 38 (2012) 1336–1345 1341

4.1. Experimental results

In the first experiment, the original HR image is the test image used to detect vision property of human eyes on spatialfrequency, with size of 150 � 150. The HR image is first convoluted by a Gaussian function of 9 � 9 window and r = 1 var-iance. Then the blurred image is downsampled by a factor of 2 in both row and column. The results of this experiment arepresented in Fig. 2, in which the close-up of image region marked with rectangle is shown at the corner. The quantitativeresults of the different approaches are presented in Table 1.

It is seen from the reconstruction results that the proposed system used universal dictionary and AL0 sparse representa-tion (UD-AL0SR) produces more sharper structures than L1SR and OMPSR. Especially, the proposed mode of dictionary con-catenating (CD-AL0SR) suppresses better blur degradation. The effectiveness of the proposed framework can also beillustrated using the objective PSNR and SSIM index, which gain 5.5 dB and 0.09 than that of bicubic interpolation, respec-tively. The OMPSR has been optimized with C language in mex function, while AL0SR has not been done. Therefore, the timecost of the proposed is heavier than that of OMPSR.

In the second experiment, the original HR image of face is degraded with same procedure used in first experiment exceptthat the downsampling factor is set to 3. The reconstruction results are shown in Fig. 3, in which nose region marked is pacedat top-left position for discriminating details. Table 2 gives the quantitative results of the methods with PSNR, SSIM, and timeindex.

Clearly, the proposed UD- and CD-AL0SR framework show better performance at reconstructing local details. Moreover,the blur is favorably suppressed, while the sharpness of image edge is well enhanced. Especially, for the fleck on the nose, thedeveloped CD-AL0SR reconstructs more information than the other approaches, producing the highest PSNR and SSIM values

Fig. 2. Reconstruction results of the first experiment with a factor of 2. (a) Bicubic interpolation. (b) L1 based SR result. (c) OMP based SR result. (d) ProposedUD-AL0SR result. (e) Proposed CD-AL0SR result. (f) Original HR image.

Table 1The quantitative results of the approaches in the first experiment.

Bicubic L1SR OMPSR UD-AL0SR CD-AL0SR

PSNR (dB) 23.2623 25.1305 27.9747 28.3205 28.7962SSIM 0.8560 0.9057 0.9400 0.9422 0.9490Time (s) 0.68 60.21 8.64 10.82 10.01

Page 7: Image super-resolution by dictionary concatenation and sparse representation with approximate L0 norm minimization

Fig. 3. Reconstruction results of the second experiment with a factor of 3. (a) Bicubic interpolation. (b) L1 based SR result. (c) OMP based SR result. (d)Proposed UD-AL0SR result. (e) Proposed CD-AL0SR result. (f) Original HR image.

Table 2The quantitative results of the approaches in the second experiment.

Bicubic L1SR OMPSR UD-AL0SR CD-AL0SR

PSNR (dB) 31.6293 32.3514 32.9554 33.0727 33.3101SSIM 0.7671 0.7928 0.8083 0.8119 0.8229Time (s) 0.78 95.55 9.82 11.59 11.02

1342 J. Lu et al. / Computers and Electrical Engineering 38 (2012) 1336–1345

in the quantitative assessment indexes. As the same with the first experiment, the computational complexity of the AL0SR isslightly higher than that of OMPSR.

The third experiment is applied into real image which is cut partially from airplane gray image. It is supposed that thereexists Gaussian blur of 5 � 5 window and r = 1 variance. The cropped image is upscaled with a factor of 3. In Fig. 4, we showthe reconstruction results of this experiment: (a) is the LR image upsampled; (b) is the direct bicubic interpolation result; (c)and (d) are L1SR and OMPSR results; (e) and (f) are the proposed UD- and CD-AL0SR results. It is illustrated from the resultsthat the suggested methods generate relatively more high frequency information missed. However, the potential noise com-ponent also is enhanced as useful information, which can be seen from Fig. 4(f).

4.2. Discussion

To show the effect of concatenated dictionary for SR reconstruction based on sparse representation, we plot the compar-ison of the PSNRs value between universal and concatenated dictionary for various test images under a factor of 2, which isshown in Fig. 5. It is seen from the results that the PSNRs of concatenation mode, compared to those of the universal one, areall raised with different testing images. Therefore, dictionary containing the particular structures in the pre-processing LRimage can remarkably advance the precision of sparse decomposition of the given image.

For AL0 sparse representation algorithm, the controlling parameters in-fact can be selected according to the statistics ofreconstruction results. Fig. 6 illustrates the change in PSNR versus the initial valued of parameter r sequence used in second

Page 8: Image super-resolution by dictionary concatenation and sparse representation with approximate L0 norm minimization

Fig. 4. Reconstruction results of the third experiment with a factor of 3. (a) Upsampled LR image. (b) Bicubic interpolation. (c) L1 based SR result. (d) OMPbased SR result. (e) Proposed UD-AL0SR result. (f) Proposed CD-AL0SR result.

Fig. 5. Comparison of PSNRs between universal and concatenated dictionary for various test images under a scale factor of 2.

J. Lu et al. / Computers and Electrical Engineering 38 (2012) 1336–1345 1343

experiment, in which the element in sequence is diminished. It concludes that when the value of r is close to 0.02 the recon-struction result may be better achieved.

In Fig. 7, the relationship between the iteration step l used in steepest descent optimization and the PSNR is given, whenr is set to optimal value. Clearly, we can get relative better result while l is set to 2.5.

Fig. 8 shows the histogram of sparsity versus the number of patches in the second experiment. With given over-completedictionary, the complicated structures of image patch may match variant numbers of dictionary atoms, namely, the sparsityshould be fluctuant, as shown in Fig. 8. The sparsity mostly bounds at 1–9. However, for OMPSR, the fixed sparsity is set to afixed constant, which is not ideal for SR problem.

Page 9: Image super-resolution by dictionary concatenation and sparse representation with approximate L0 norm minimization

Fig. 6. Change in the PSNR versus the initial value of parameter r sequence in the second experiment.

Fig. 7. Change in the PSNR versus the size of parameter l for steepest descent optimization in the second experiment.

Fig. 8. Histogram of sparsity versus the number of patches in the second experiment.

1344 J. Lu et al. / Computers and Electrical Engineering 38 (2012) 1336–1345

Page 10: Image super-resolution by dictionary concatenation and sparse representation with approximate L0 norm minimization

J. Lu et al. / Computers and Electrical Engineering 38 (2012) 1336–1345 1345

For other parameters in AL0 algorithm, since the strong prior of sparse representation, the experiments show that theparameter k can be flexibly set; the iterations N can be set to 3 or 4, which constrains the computation load of the algorithm.

5. Conclusions

This work proposed an image super-resolution reconstruction method based on dictionary concatenation and sparse rep-resentation via approximate L0 norm. The universal offline dictionary and the specific online one make up together the effi-cient over-complete dictionary for the given LR image. The approximate L0 norm minimization uses more or less atoms thanthose greedy matching pursuit algorithms constrained by fixed sparsity, and therefore gets more accurate sparse decompo-sition. The various reconstruction results showed that the suggested method, compared to the similar approaches, can effec-tively improve the resolution of LR images. However, with this study, the completeness of dictionary is one of the importantissues to further improve result, which is featured by the richness of primitive structures and the suitability of degradationmodel at LR image.

Acknowledgments

We appreciate the helpful comments and suggestions from the editors and reviewers. This work was partly supported byWest Light Personnel Training Project Grant of Chinese Academy of Sciences. We would like to thank authors who gener-ously share their data and codes.

References

[1] Park SC, Park MK, Kang MG. Super-resolution image reconstruction: a technical overview. IEEE Signal Process Mag 2003;20(3):21–36.[2] van Ouwerkerk JD. Image super-resolution survey. Image Vis Comput 2006;24:1039–52.[3] Chen MJ, Huang CH, Lee WL. A fast edge-oriented algorithm for image interpolation. Image Vis Comput 2005;23:791–8.[4] Baker S, Kanade T. Limits on super-resolution and how to break them. In: Proceedings of the IEEE conference on computer vision and pattern

recognition (CVPR), vol. 2; 2000. p. 372–9.[5] Freeman WT, Jones RJ, Pasztor EC. Example-based super-resolution. IEEE Trans Comput Graph Appl 2002;22(2):56–65.[6] Chang H, Yeung DY, Xiong Y. Super-resolution through neighbor embedding. In: Proceedings of the IEEE conference on computer vision and pattern

recognition (CVPR), vol. 1; 2004. p. 275–82.[7] Milanfar P. Super-resolution imaging. Boca Raton, FL, USA: CRC Press; 2010. p. 1–24.[8] Protter M, Elad M. Super resolution with proballistic motion estimation. IEEE Trans Image Process 2009;18(8):1899–904.[9] Elad M, Feuer A. Superresolution restoration of an image sequence: adaptive filtering approach. IEEE Trans Image Process 1999;8(3):387–95.

[10] Yang JC, Wright J, Huang T, Ma Y. Image super-resolution as sparse representation of raw image patches. In: Proceedings of the IEEE conference oncomputer vision and pattern recognition (CVPR), vol. 1; 2008. p. 1–8.

[11] Yang JC, Wright J, Huang T, Ma Y. Image super-resolution via sparse representation. IEEE Trans Image Process 2010;19(11):2861–73.[12] Roman Z, Elad M, Protter M. On single image scale-up using sparse-representations. In: Proceedings of the 7th international conferences on curves and

surfaces; 2010.[13] Mohimani GH, Massoud BZ, Jutten C. A fast approach for over-complete sparse decomposition based on smoothed L0 norm. IEEE Trans Signal Process

2009;57(1):289–301.[14] Mallat S. A wavelet tour of signal processing, the sparse way. Burlington: Academic Press; 2009. p. 693–5.[15] Bruckstein A, Donoho DL, Elad M. From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev Soc Ind Appl Math

2009;51(1):34–81.[16] Donoho DL. Compressed sensing. IEEE Trans Inf Theory 2006;52(4):1289–306.[17] Wang Z, Bovik AC, Sheikh HR. Image quality assessment: form error visibility to structural similarity. IEEE Trans Image Process 2004;13(4):600–12.

Jinzheng Lu received a B.Sc. degree in physics from Qufu Normal University in 2000, M.Sc. degree in measuring and testing technologies and instrumentsfrom Chengdu University of Technology 2003. Currently he is working towards the Ph.D. degree in the Institute of Optics and Electronics (IOE), ChineseAcademy of Sciences. His research interests include sparse representation theory, image super-resolution reconstruction, and image codec processing.

Qiheng Zhang received a B.Sc. degree in television major from University of Electronic Science and Technology of China (UESTC) in 1977. Currently he is afellow with the Institute of Optics and Electronics (IOE), Chinese Academy of Sciences. His research interests include optoelectronic target detection anddigital image processing.

Zhiyong Xu received a B.Sc. degree in computer software from Sichuan University in 1990, M.Sc. degree in computer software from University of ElectronicScience and Technology of China (UESTC) in 1998. Currently he is a fellow with the Institute of Optics and Electronics (IOE), Chinese Academy of Sciences.His research interests include image processing, target detection, recognition, and tracking.

Zhenming Peng received a B.Sc. degree in physics from Jishou University in 1988, M.Sc. degree in geophysics from South West Petroleum Institute (China)in 1996, and Ph.D. degree in earth exploration and information technology from Chengdu University of Technology in 2001. From 2001 to 2003, he workedas a postdoctoral in the Institute of Optics and Electronics (IOE), Chinese Academy of Sciences. Currently he is a professor with the University of ElectronicScience and Technology of China (UESTC). His research interests include signal processing, image processing, target recognition, and tracking.